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Abstract of W09731 114 

This invention relates to Staphylococcal polynucleotides, polypeptides encoded by such polynucleotides, 
the uses of such polynucleotides and polypeptides, as well as the production of such polynucleotides and 
polypeptides and recombinant host cells transformed with the polynucleotides. This invention also relates 
to inhibiting the biosynthesis or action of such polynucleotides or polypeptides and to the use of such 
inhibitors in therapy. 
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WO 97/31114 PCT/GB97/00524 

POLYNUCLEOTIDES AND AMINOACID SEQUENCES FROM STAPHYLOCOCCUS AUREUS 
FIELD OF THE INVENTION 

This invention relates to newly identified polynucleotides, polypeptides encoded by 
such polynucleotides, the use of such polynucleotides and polypeptides, as well as the 

5 production of such polynucleotides and polypeptides and recombinant host cells 
transformed with the polynucleotides. This invention also relates to inhibiting the 
biosynthesis or action of such polypeptides and to the use of inhibitors in therapy. 
BACKGROUND OF THE INVENTION 

The Staphylococci make up a medically important genera of microbes. They are 

1 0 known to produce two types of disease, invasive and toxigenic. Invasive infections are 
characterized generally by abscess formation effecting both skin surfaces and deep tissues. 
Staphlococcus aureus is the second leading cause of bacteremia in cancer patients. 
Osteomyelitis, septic arthritis, septic thrombophlebitis and acute bacterial endocarditis are 
also relatively common. There are at least three clinical conditions resulting from the 

1 5 toxigenic properties of Staphylococci. The manifestation of these diseases result from the 
actions of exotoxins as opposed to tissue invasion and bacteremia. These conditions 
include: Staphylococcal food poisoning, scalded skin syndrome and toxic shock syndrome. 

While certain Staphylococcal proteins associated with pathogenicity have been 
identified, e.g., coagulase, hemolysins, leucocidins and exo and enterotoxins, very little is 

20 known concerning the temporal expression of genes of bacterial pathogens during infection 
and disease progression in a mammalian host. Discovering the sets of genes the bacterium 
is likely to be expressing at the different stages of infection, particularly when an infection 
is established, provides critical information for the screening and characterization of novel 
antibacterials which can interrupt pathogenesis, by identifying possible previously 

25 unrecognised targets. 

Recently several novel approaches have been described which purport to follow 
global gene expression during infection (Chuang, S. et al. [1993] Global Regulation of 
Gene Expression in Escherichia coli J. Bacterid. 175. 2026-2036, Mahan, MJ. et al. 
[1993) Selection of Bacterial Virulence Genes That Are Specifically Induced in Host 

30 Tissues SCIENCE 259, 686-688, Hensel, M et al. [1995] Simultaneous Identification of 
Bacterial Virulence Genes by Negative Selection SCIENCE 269 ? 400-403). These new 
techniques have so far been demonstrated with gram negative pathogen infections and not 
with infections with gram positives presumably due to the much slower development of 
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global transposon mutagenesis and suitable vectors needed for these strategies in these 
organisms, and in the case of that process described by Chuang, S. et al.[l993] the 
difficulty of isolating suitable quantities of bacterial RNA free of mammalian RNA derived 
from the infected tissue to furnish bacterial RNA labelled to sufficiently high specific 
activity. The present invention employs a novel technology to determine gene expression in 
the pathogen at different stages of infection of the mammalian host. 
DETAILED DESCRD7TION OF THE INVENTION 

A novel aspect of this invention is the use of a suitably labelled oligonucleotide 
probe which anneals specifically to the bacterial ribosomal RNA in Northern blots of 
bacterial RNA preparations from infected tissue. Using the more abundant ribosomal RNA 
as a hybridisation target greatly facilitates the optimisation of a protocol to purify bacterial 
RNA of a suitable size and quantity for RT-PCR from infected tissue. Techniques reported 
in the scientific literature which are of use in purifying Staphylococcus aureus RNA from 
bacteria grown in vitro are unsuccessful when applied to infected tissue. 

In a first aspect therefore, the invention provides a method of identifying genes 
transcribed in an organism in infected host tissue by identifying mRNA present using RT- 
PCR, characterised in that a bacterial mRNA preparation is obtained from total RNA from 
infected tissue by enriching for bacterial RNA by a suitable bacterial disruption technique 
in order to selectively damage mammalian RNA and at the same time give sufficient 
quantities of bacterial RNA for RT-PCR, and wherejn the conditions for selectively 
enriching for bacterial RNA are determined by probing with an oligonucleotide probe 
specific to bacterial ribosomal RNA. 

This process of optimisation preferably uses a unique labelled oligonucleotide 
probe to bacterial ribosomal RNA which is used in Northern experiments against the 
experimental RNA preparations to determine those conditions which give optimal levels of 
bacterial RNA. As bacterial ribosomal RNA is present at 2-4 orders of magnitude in 
amount to bacterial mRNA species this detection procedure provides a suitably sensitive 
indication to the existence and quantity of bacterial RNA in the presence of the vastly 
greater levels of mammalian RNA from the infected tissue. This detection system may be 
used in conjunction with the visualisation of total RNA by ethidium bromide staining of 1% 
agarose gels on which it has been run out. On these gels mammalian ribosomal RNA 
migrates at a different rate to bacterial ribosomal RNA and so can be identified. 
Surprisingly, those disruption conditions which were found to just lead to the loss of 
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mammalian RNA gave the best preparations of bacterial RNA as judged by the Northern 
experiment. A suitable oligonucleotide useful for applying this method to genes expressed 
in Staphylococcus aureus is S'-gctcctaaaaggttactccaccggc-y [SEQ rD NO.-91]. 

Use of the technology of the present invention enables identification of bacterial 
5 genes transcribed during infection, inhibitors of which would have utility in anti-bacterial 
therapy. Specific inhibitors of such gene transcription or of the subsequent translation of 
the resultant mRNA or of the function of the corresponding expressed proteins would have 
utility in anti-bacterial therapy 

The present invention provides a polynucleotide having the DNA sequence given in 
10 any of sequences set forth in, or selected from the group consisting essentially of 
SEQUENCE I [SEQIDNos.1,4,7,10,13,16,19,22,25,28, 

31.34,37,40,43,46,49,52,55,58,61,64,67,70,73,761 of Table I, or any combination of 
the sequences thereof. The invention further provides a polynucleotide encoding a protein 
from S. aureus WCUH 29 and characterized in that it comprises the DNA sequence given 
15 in any of sequences set forth in SEQUENCE I [SEQ ID Nos: 

1,4,7,10,13,16,19,22,25,28,31,34,37,40,43,46,49.52,55,58,61,64, 
67,70,73,76] of Table I . The polynucleotides having the DNA sequence given in each 
sequence set forth in SEQUENCE 1 [SEQ ID Nos: 1,4,7,10,13,16,19,22,25,28, 
31,34,37,40,43,46,49,52,55,58,61,64,67,70,73,76] of Tab.e I were obtained from the 
20 sequencing of a library of clones of chromosomal DNA of S.aureus WCUH 29 in Rcoli. 

S. aureus WCUH 29 has been deposited a, the National Collection of Industrial and 
Marine Bacteria Ltd. (NCIMB), Aberdeen, Scotland under number NCIMB 40771 on 1 1 
September 1995. 

The present invention also provides a novel protein from Staphylococcus, aureus 
25 WCUH29 obtainable by expression of a gene characterised in that it comprises the DNA 
sequence given in any of sequences set forth in SEQUENCE I [SEQ ID Nos: 1,4,7,10, 

13,16,19,22^5,28,31,34,37,40,43,46,49,52,55,58,61,64,67,70,73,76] of Table I, or a 
fragment, analogue or derivative thereof. 

The present invention further relates to a novel protein from Staphylococcus 
30 aureus WCUH29, characterised in that it comprises the amino acid sequence given in any 
of the sequences set forth in, or selected from the group consisting essentially of. 
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SEQUENCE 2 [SEQ ID Nos: 79,80,8 1 ,82,83,84,85,86,87,88,89,90] of Table 1, or a 
fragment, analogue or derivative thereof. 

The invention also relates to a polypeptide fragment of the protein, having the 
amino acid sequence given in any of the sequences set forth in SEQUENCE 2 [SEQ ID 
Nos: 79,80,81 .82,83,84,85,86,87,88,89,90] of Table 1, or a derivative thereof. 

Hereinafter the term polypeptide(s) will be used to refer to the protein and its 
fragments, analogues or derivatives. 

In accordance with another aspect of the present invention, there are provided 
polynucleotides (DNA or RNA) which encode such polypeptides. 

The invention also relates to novel oligonucleotides, including the sequences set 
forth in SEQUENCE 3 [SEQ ID Nos: 2,5,8,1 1,14,17,20,23,26,29,32,35,38,41,44,47, 
50,53,56,59,62,65,68,71,74,77] and 4 [SEQ ID Nos: 3,6,9,12,15,18,21,24,27,30, 
33,36,39,42,45,48,51,54,57,60.63,66.69,72,75,78] of Table I, derived from the 
sequences set forth in SEQUENCE 1 [SEQ ID Nos: 1,4,7,10,13,16,19,22,25,28,31,34, 
37,40,43,46,49,52,55,58,61,64,67,70,73,76] of Table I which can act as PCR primers in 
the process herein described to determine whether or not the Staphylococcus aureus genes 
identified herein in whole or in part are transcribed in infected tissue. It is recognised that 
such sequences will also have utility in diagnosis of the stage of infection and type of 
infection the pathogen has attained. 

Each of the DNA sequences provided herein may be used in the discovery and 
development of antibacterial compounds. The encoded protein upon expression can be 
used as a target for the screening of antibacterial drags. Additionally, the DNA sequences 
encoding regions of the encoded protein or Shine-Delgamo or other translation facilitating 
sequences of the respective mRNA can be used to construct antisense sequences to control 
25 the expression of the coding sequence of interest. Furthermore, many of the sequences 
disclosed herein also provide regions upstream and downstream from the encoding 
sequence. These sequences are useful as a source of regulatory elements for the control of 
bacterial gene expression. Such sequences are conveniently isolated by restriction enzyme 
action or synthesized chemically and introduced, for example, into promoter identification 
strains. These strains contain a reporter structural gene sequence located downstream from 
a restriction site such that if an active promoter is inserted, the reporter gene will be 
expressed. 
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Although each of the sequences may be employed as described above, this 
invention also provides several means for identifying particularly useful target genes. The 
first of these approaches entails searching appropriate databases for sequence matches 
Thus, if a homologue exists, the Staphylococcal-like form of this gene would likely p.ay an 
5 analogous role. For example, a Staphylococcal protein identified as homologous to a cell 
surface protein in another organism would be useful as a vaccine candidate. To the extent 
such homologies have been identified for the sequences disclosed herein they are rep0 rted 
along with the encoding sequence. 

To obtain the polynucleotide encoding the protein using any DNA sequence given 
' «n a SEQ ID NO I typically a library of clones of chromosomal DNA of S.aureus WCUH 
29 in E.coli or some other suitable host is probed with a radiolabeled oligonucleotide 
preferably a I7mer or longer, derived from the partial sequence. Clones canying DNA 
identical to that of the probe can then be distinguished using high stringency washes. By 
sequencing the individual Cones thus identified with sequencing primers designed from the 
ongma. sequence it is then possible to extend the sequence in both directions to determine 
the full gene sequence. Conveniently such sequencing is performed using denatured double 
stranded DNA prepared from a plasmid clone. Suitable techniques are described by 
Man,at,s, T., Fritsch, E.F. and Sambrook, J. in MOLECULAR CLONING, A Laboratory 
Manual [2nd edition 1 989 Cold Spring Harbor Laboratory, see Screening By Hybridization 
I -90 and Sequencing Denatured Double-Stranded DNA Templates 13.70]. 

A polynucleotide of the present invention may be in the form of RNA or in the 
form of DNA. which DNA includes cDNA, genomic DNA, and synthetic DNA. The DNA 
may be double-stranded or single-stranded, and ifsingle stranded may be the coding strand 
or non-coding (anti-sense) strand. The coding sequence which encodes the polypeptide 
may be identical to the coding sequence of any of the sequences of SEQUENCE 1 [SEQ ID 
Nos: M ,7,10,13,16,,9,22^ 

Table I or may be a different coding sequence which coding sequence, as a result of the 
redundancy or degeneracy of the genetic code, encodes the same polypeptide. 

The present invention includes variants of the hereinabove described 
polynucleotides which encode fragments, analogues and derivatives of the polypeptides of 
the mvention. and in particular polypeptides characterized by the deduced amino acid 
sequences set forth in each SEQUENCE 2 [SEQ ID Nos: 79,80,8 1 ,82,83,84 
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85,86,87,88,89,90] of Table L The variant of the polynucleotide may be a naturally 
occurring allelic variant of the polynucleotide or a non-naturally occurring variant of the 
polynucleotide. 

Thus, the present invention includes polynucleotides encoding the same 
5 polypeptides of the invention, and in particular characterized by the deduced amino acid 
sequences set forth in each SEQUENCE 2 [SEQ ID Nos: 79,80,81,82,83,84,85,86,87, 
88,89,90] of Table I as well as variants of such polynucleotides which variants encode for 
a fragment, derivative or analogue of the polypeptide. Such nucleotide variants include 
deletion variants, substitution variants and addition or insertion variants. 

10 The polynucleotide may have a coding sequence which is a naturally occurring 

allelic variant of the coding sequence characterized by the DNA sequence of any of the 
sequences set forth in Table I as SEQUENCE 1 [SEQ ID Nos: 1,4,7, 10, 13, 16, 19,22,25, 
28,31,34,37,40,43,46,49,52,55,58,61,64,67,70,73,76]. As known in the art, an allelic 
variant is an alternate form of a polynucleotide sequence which may have a substitution, 

1 5 deletion or addition of one or more nucleotides, which does not substantially alter the 
function of the encoded polypeptide. 

The polynucleotide which encodes for the mature polypeptide may include only the 
coding sequence for the mature polypeptide or the coding sequence for the mature 
polypeptide and additional coding sequence such as a leader or secretory sequence or a 

20 proprotein sequence. 

Thus, the term "polynucleotide encoding a polypeptide" encompasses a 
polynucleotide which includes only coding sequence for the polypeptide as well as a 
polynucleotide which includes additional coding and/or non-coding sequence. 

The present invention therefore includes polynucleotides, wherein the coding 

25 sequence for the mature polypeptide may be fused in the same reading frame to a 

polynucleotide sequence which aids in expression and secretion of a polypeptide from a 
host cell, for example, a leader sequence which functions as a secretory sequence for 
controlling transport of a polypeptide from the cell. The polypeptide having a leader 
sequence is a preprotein and may have the leader sequence cleaved by the host cell to form 

30 the mature form of the polypeptide. The polynucleotides may also encode for a proprotein 
which is the mature protein plus additional 5 1 amino acid residues. A mature protein having 
a prosequence is a proprotein and is an inactive form of the protein. Once the prosequence 
is cleaved an active mature protein remains. 
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Thus, for example, the polynucleotide of the present invention may encode for a 
mature protein, or for a protein having a prosequence or for a protein having both a 
prosequence and a presequence (leader sequence). Further, the amino acid sequences 
provided herein show a methionine residue at the NH r terminus. It is appreciated, 
5 however, that during post-translational modification of the peptide, this residue may be 
deleted. Accordingly, this invention contemplates the use of both the sequences. 

An expression vector is constructed so that the particular coding sequence is 
located in the vector with the appropriate regulatory sequences, the positioning and 
orientation of the coding sequence with respect to the control sequences being such that the 
1 0 coding sequence is transcribed under the '•control" of the control sequences (i.e., RNA 
polymerase which binds to the DNA molecule at the control sequences transcribes the 
coding sequence). Modification of the coding sequences may be desirable to achieve this 
end. For example, in some cases it may be necessary to modify the sequence so that it may 
be attached to the control sequences with the appropriate orientation; i.e., to maintain the 
reading frame. The control sequences and other regulatory sequences may be ligated to the 
coding sequence prior to insertion into a vector, such as the cloning vectors described 
above. Alternatively, the coding sequence can be cloned directly into an expression vector 
which already contains the control sequences and an appropriate restriction site. 

Generally, recombinant expression vectors will include origins of replication and 
selectable markers permitting transformation of the host cell, e.g., the ampicillin resistance 
gene of £ «,// and S. cerevisiae TRP1 gene , and a promoter derived from a highly- 
expressed gene to direct transcription of a downstream structural sequence. The 
heterologous structural sequence is assembled in appropriate phase with translation 
initiation and termination sequences, and preferably, a leader sequence capable of directing 
secretin of translated protein into the periplasmic space or extracellular medium. 
Optionally, the heterologous sequence can encode a fusion protein including an N-terminal 
■denttfication peptide imparting desired characteristics, e.g., stabilization or simplified 
purification of expressed recombinant product. 

The vector containing the appropriate DNA sequence as hereinabove described, as 
well as an appropriate promoter or control sequence, may be employed to transform an 
appropriate host to permit the host to express the protein. 

More particularly, the present invention also includes recombinant constructs 
comprising one or more of the sequences as broadly described above. The constructs 
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comprise a vector, such as a plasmid or viral vector, into which a sequence of the invention 
has been inserted, in a forward or reverse orientation. In a preferred aspect of this 
embodiment, the construct further comprises regulatory sequences, including, for example, 
a promoter, operably linked to the sequence. Large numbers of suitable vectors and 
5 promoters are known to those of skill in the art, and are commercially available. The 

following vectors are provided by way of example. Bacterial: pET-3 vectors (Stratagene), 
pQE70, pQE60, pQE-9 (Qiagen), pbs, pDIO, phagescript, psiXI74, pbluescript SK, pbsks, 
pNH8A, pNH16a, pNH18A, pNH46A (Stratagene); ptrc99a, pKK223-3, pKJC233-3, 
pDR540, pRJTS (Pharmacia). Eukaryotic: pBlueBaclII (Invitrogen), pWLNEO, 
1 0 pSV2CAT, pOG44, pXTl, pSG (Stratagene) pSVK3, pBPV, pMSG, pSVL (Pharmacia). 
However, any other plasmid or vector may be used as long as they are replicable and viable 
in the host. 

Examples of recombinant DNA vectors for cloning and host cells which they can 
transform include the bacteriophage X (£. coli), pBR322 (£. coli), pACYC177 (£. coli), 

1 5 pKT230 (gram-negative bacteria), pGVl 1 06 (gram-negative bacteria), pLAFRl (gram- 
negative bacteria), pME290 (non-£. coli gram-negative bacteria), pHV14 (£. coli and 
Bacillus subtilis), pBD9 (Bacillus), pIJ61 (Streptomyces), pVC6 (Streptomyces), YIp5 
(Saccharomyces), a baculovirus insect cell system, YCpI9 (Saccharomyces). See, 
generally, "DNA Cloning": Vols. I & II, Glover*/ aL ed. IRL Press Oxford (1985) (1987) 

20 and; T. Maniatis et aL ("Molecular Cloning" Cold Spring Harbor Laboratory ( 1 982). 
methionine-containing and the methionineless amino terminal variants of each protein 
disclosed herein. 

The polynucleotides of the present invention may also have the coding sequence 
fused in frame to a marker sequence at either the 5' or 3' terminus of the gene which allows 
25 for purification of the polypeptide of the present invention. The marker sequence may be a 
hexa-histidine tag supplied by the pQE series of vectors (supplied commercially by 
Quiagen Inc.) to provide for purification of the polypeptide fused to the marker in the case 
of a bacterial host. 

The present invention further relates to polynucleotides which hybridize to the 
30 hereinabove-described sequences if there is at least 50% and preferably at least 70% 
identity between the sequences. The present invention particularly relates to 
Staphylococcal polynucleotides which hybridize under stringent conditions to the 
hereinabove-described polynucleotides . As herein used, the term "stringent conditions" 
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means hybridization will occur only if there is at least 95% and preferably at least 97% 
identity between the sequences. The polynucleotides which hybridize to the hereinabove 
described polynucleotides in a preferred embodiment encode polypeptides which retain 
substantially the same biological function or activity as the polypeptide of the invention. A 
5 preferred embodiment of the invention is a polynucleotide having at least a 70%. 80% 90% 
or 95% identity to a polynucleotide encoding a polypeptide comprising an amino acid 
sequence selected from the group consisting essentially of SEQ ID Nos- 
79,80,81.82,83,84,85,86,87,88 and 89, or any combination ofthese amino acid sequences. 
The deposit referred to herein will be maintained under the terms of the Budapest 
10 Treaty on the International Recognition of the Deposit of Micro-organisms for purposes of 
Paten, Procedure. These deposits are provided merely as convenience to those of skill in 
the art and are not an admission that a deposit is required under 35 U.S.C. § 11 2. The 
sequence of the polynucleotides contained in the deposited material, as well as the amino 
acd sequence of the polypeptides encoded thereby, are incorporated herein by reference 
and are controlling in the event of any conflict with any description of sequences herein A 
hcense may be required to make, use or sell the deposited material, and no such license is 
hereby granted. 

The terms "fragment," "derivative" and "analogue" when referring to the 
polypeptide of the invention, means a polypeptide which retains essentially the same 
b.ological function or activity as such polypeptide. Thus, an analogue includes a proprotein 
wh,ch can be activated by cleavage of the proprotein portion to produce an active mature 
polypeptide. 

The polypeptide of the present invention may be a recombinant polypeptide a 
natura. polypeptide or a synthetic polypeptide, preferably a recombinant polypeptide. 

The fragment, derivative or analogue of the polypeptide of the invention may be (i) 
one ,„ which one or more of the amino acid residues are substituted with a conserved or 
non-conserved amino acid residue (preferably a conserved amino acid residue) and such 
substuuted amino acid residue may or may not be one encoded by the genetic code, or (ii) 
one ,n which one or more of the amino acid residues includes a substituent group, or (Hi) 
one in which the polypeptide is fused with another compound, such as a compound to 
■ncrease the half-Hfe of the polypeptide (for example, polyethylene glycol), or (iv) one in 
wh,ch the additional amino acids are fused to the polypeptide, such as a leader or secretory 
sequence or a sequence which is employed for puriflcation of the polypeptide or a 
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proprotein sequence. Such fragments, derivatives and analogues are deemed to be within 
the scope of those skilled in the art from the teachings herein. 

The polypeptides and polynucleotides of the present invention are preferably 
provided in an isolated form, and preferably are purified to homogeneity. 

The term "isolated" means that the material is removed from its original 
environment (e.g., the natural environment if it is naturally occurring). For example, a 
naturally-occurring polynucleotide or polypeptide present in a living animal is not isolated, 
but the same polynucleotide or polypeptide, separated from some or all of the coexisting 
materials in the natural system, is isolated. Such polynucleotides could be part of a vector 
and/or such polynucleotides or polypeptides could be part of a composition, and still be 
isolated in that such vector or composition is not part of its natural environment. 

The present invention also relates to vectors which include polynucleotides of the 
present invention, host cells which are genetically engineered with vectors of the invention 
and the production of polypeptides of the invention by recombinant techniques. 

In accordance with yet a further aspect of the present invention, there is therefore 
provided a process for producing the polypeptide of the invention by recombinant 
techniques by expressing a polynucleotide encoding said polypeptide in a host and 
recovering the expressed product. Alternatively, the polypeptides of the invention can be 
synthetically produced by conventional peptide synthesizers. 

Host cells are genetically engineered (transduced or transformed or transfected) 
with the vectors of this invention which may be, for example, a cloning vector or an 
expression vector. The vector may be, for example, in the form of a plasmid, a cosmid, a 
phage, etc. The engineered host cells can be cultured in conventional nutrient media 
modified as appropriate for activating promoters, selecting transformants or amplifying the 
genes. The culture conditions, such as temperature, pH and the like, are those previously 
used with the host cell selected for expression, and will be apparent to the ordinarily skilled 
artisan. 

Suitable expression vectors include chromosomal, nonchromosomal and synthetic 
DNA sequences, e.g., bacterial plasmids; phage DNA; baculovirus; yeast plasmids; vectors 
derived from combinations of plasmids and phage DNA. However, any other vector may 
be used as long as it is replicable and viable in the host. 
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The appropriate DNA sequence may be inserted into the vector by a variety of 
procedures. In general, the DNA sequence is inserted into an appropriate restriction 
endonuclease site(s) by procedures known in the art. 

The DNA sequence in the expression vector is operatively linked to an appropriate 
5 expression control sequences) (promoter) to direct mRNA synthesis. As representative 
examples of such promoters, there may be mentioned: LTR or SV40 promoter, the £. coli. 
lac or rrp, the phage lambda P L promoter and other promoters known to control expression 
of genes in eukaryotic or prokaryotic cells or their viruses. The expression vector also 
contains a ribosome binding site for translation initiation and a transcription terminator. 
10 The vector may also include appropriate sequences for amplifying expression. 

In addition, the expression vectors preferably contain one or more selectable 
marker genes to provide a phenotypic trait for selection of transformed host cells such , 
dihydrofolate reductase or neomycin resistance for eukaryotic cell culture, or such , 
tetracycline or ampicillin resistance in E. coli. 
15 The gene can be placed under the control of a promoter, ribosome binding site (for 

bacterial expression) and, optionally, an operator (collectively referred to herein as 
"control" elements), so that the DNA sequence encoding the desired protein is transcribed 
into RNA in the host cell transformed by a vector containing this expression construction. 
The coding sequence may or may not contain a signal peptide or leader sequence. The 
20 polypeptides of the present invention can be expressed using, for example, the E. coli tac 
promoter or the protein A gene (spa) promoter and signal sequence. Leader sequences can 
be removed by the bacterial host in post-translational processing. See, e.g., U.S. Patent 
Nos. 4,43 1 ,739; 4,425,437; 4,338,397. Promoter regions can be selected from any desired 
gene using CAT (chloramphenicol transferase) vectors or other vectors with selectable 
25 markers. Two appropriate vectors are PKK232-8 and PCM7. Particular named bacterial 
promoters include lad, lacZ, T3, T7, gpt, lambda P R , P L and trp. Eukaryotic promoters 
include CMV immediate early, HSV thymidine kinase, early and late SV40, LTRs from 
retrovirus, and mouse metallothionein-L Selection of the appropriate vector and promoter is 
well within the level of ordinary skill in the art. 
30 In addition to control sequences, it may be desirable to add regulatory sequences 

which allow for regulation of the expression of the protein sequences relative to the growth 
of the host cell. Regulatory sequences are known to those of skill in the art, and examples 
■nclude those which cause the expression of a gene to be turned on or off in response to 
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chemical or physical stimulus, including the presence of a regulatory compound. Other 
types of regulatory elements may also be present in the vector, for example, enhancer 

In some cases, it may be desirable to add sequences which cause the secretion of 
the polypeptide from the host organism, with subsequent cleavage of the secretory signal. 

Polypeptides can be expressed in host cells under the control of appropriate 
promoters. Cell-free translation systems can also be employed to produce such proteins 
using RNAs derived from the DNA constructs of the present invention. Appropriate 
cloning and expression vectors for use with prokaryotic and eukaryotic hosts are described 
by Sambrook, et aL, Molecular Cloning: A Laboratory Manual, Second Edition, Cold 
Spring Harbor, N.Y., (1989), the disclosure of which is hereby incorporated by reference. 

Following transformation of a suitable host strain and growth of the host strain to 
an appropriate cell density, the selected promoter is induced by appropriate means (e.g., 
temperature shift or chemical induction) and cells are cultured for an additional period. 

Cells are typically harvested by centrifiigation, disrupted by physical or chemical 
means, and the resulting crude extract retained for further purification. 

Microbial cells employed in expression of proteins can be disrupted by any 
convenient method, including freeze-thaw cycling, sonication, mechanical disruption, or 
use of cell lysing agents, such methods are well known to those skilled in the art. 

Depending on the expression system and host selected, the polypeptide of the 
present invention may be produced by growing host cells transformed by an expression 
vector described above under conditions whereby the polypeptide of interest is expressed. 
The polypeptide is then isolated from the host cells and purified. If the expression system 
secretes the polypeptide into growth media, the polypeptide can be purified directly from 
the media. If the polypeptide is not secreted, it is isolated from cell lysates or recovered 
from the cell membrane fraction. Where the polypeptide is localized to the cell surface, 
whole cells or isolated membranes can be used as an assayable source of the desired gene 
product. Polypeptide expressed in bacterial hosts such as E. coli may require isolation from 
inclusion bodies and refolding. Where the mature protein has a very hydrophobic region 
which leads to an insoluble product of overexpression, it may be desirable to express a 
truncated protein in which the hydrophobic region has been deleted. The selection of the 
appropriate growth conditions and recovery methods are within the skill of the art. 

The polypeptide can be recovered and purified from recombinant cell cultures by 
methods including ammonium sulphate or ethanol precipitation, acid extraction, anion or 
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cation exchange chromatography, phosphocellulose chromatography, hydrophobic 
interaction chromatography, affinity chromatography, hydroxylapatite chromatography and 
lectin chromatography. Protein refolding steps can be used, as necessary, in completing 
configuration of the mature protein. Finally, high performance liquid chromatography 
(HPLC) can be employed for final purification steps. 

Depending upon the host employed in a recombinant production procedure, the 
polypeptides of the present invention may be glycosylated or may be non-glycosylated. 
Polypeptides of the invention may also include an initial methionine amino acid residue. 

A "replicon" is any genetic element (e.g., plasmid, chromosome, virus) that 
functions as an autonomous unit of DNA replication in vivo; i.e.. capable of replication 
under its own control. 

A "vector" is a replicon, such as a plasmid, phage, or cosmid, to which another 
DNA segment may be attached so as to bring about the replication of the attached segment. 

A "double-stranded DNA molecule" refers to the polymeric form of 
deoxyribonucleotides (bases adenine, guanine, thymine, or cytosine) in a double-stranded 
helix, both relaxed and supercoiled. This term refers only to the primary and secondary 
structure of the molecule, and does not limit it to any particular tertiary forms. Thus, this 
term includes double-stranded DNA found, inter alia, in linear DNA molecules (e.g., 
restriction fragments), viruses, plasmids. and chromosomes. In discussing the structure of 
particular double-stranded DNA molecules, sequences may be described herein according 
to the normal convention of giving only the sequence in the 5' to 3' direction along the 
nontranscribed strand of DNA (i.e., the strand having the sequence homologous to the 
mRNA). 

A DNA "coding sequence of" or a "nucleotide sequence encoding" a particular 
protein, is a DNA sequence which is transcribed and translated into a polypeptide when 
placed under the control of appropriate regulatory sequences. 

A "promoter sequence" is a DNA regulatory region capable of binding RNA 
polymerase in a cell and initiating transcription of a downstream (3' direction) coding 
sequence. For purposes of defining the present invention, the promoter sequence is bound 
at the y terminus by a translation start codon (e.g., ATG) of a coding sequence and extends 
upstream (5' direction) to include the minimum number of bases or elements necessary ,o 
mrtrate transcription at .evels detectable above background. Within the promoter sequence 
w.H be found a transcription initiation site (conveniently defined by mapping with nuclease 
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S I ), as well as protein binding domains (consensus sequences) responsible for the binding 
of RNA polymerase. Eukaryotic promoters will often, but not always, contain "TATA" 
boxes and "CAT boxes. Prokaryotic promoters contain Shine-Dalganio sequences in 
addition to the - 10 and -35 consensus sequences. 

DNA "control sequences" refers collectively to promoter sequences, ribosome 
binding sites, polyadenylation signals, transcription termination sequences, upstream 
regulatory domains, enhancers, and the like, which collectively provide for the expression 
(i.e., the transcription and translation) of a coding sequence in a host cell. 

A control sequence "directs the expression" of a coding sequence in a cell when 
RNA polymerase will bind the promoter sequence and transcribe the coding sequence into 
mRNA, which is then translated into the polypeptide encoded by the coding sequence. 

A "host cell" is a cell which has been transformed or transfected, or is capable of 
transformation or transfection by an exogenous DNA sequence. 

A cell has been "transformed" by exogenous DNA when such exogenous DNA has 
been introduced inside the cell membrane. Exogenous DNA may or may not be integrated 
(covalently linked) into chromosomal DNA making up the genome of the cell. In 
prokaryotes and yeasts, for example, the exogenous DNA may be maintained on an 
episomal element, such as a plasmid. With respect to eukaryotic cells, a stably transformed 
or transfected cell is one in which the exogenous DNA has become integrated into the 
chromosome so that it is inherited by daughter cells through chromosome replication. This 
stability is demonstrated by the ability of the eukaryotic cell to establish cell lines or clones 
comprised of a population of daughter cell containing the exogenous DNA. 

A "clone" is a population of cells derived from a single cell or common ancestor by 
mitosis. A "cell line" is a clone of a primary cell that is capable of stable growth in vitro 
for many generations. 

A "heterologous" region of a DNA construct is an identifiable segment of DNA 
within or attached to another DNA molecule that is not found in association with the other 
molecule in nature. 

In accordance with yet a further aspect of the present invention, there is provided 
the use of a polypeptide of the invention for therapeutic or prophylactic purposes, for 
example, as an antibacterial agent or a vaccine. 



14 



WO 97/31114 



PCT/GB97/00524 



In accordance with another aspect of the present invention, there is provided the 
use of a polynucleotide of the invention for therapeutic or prophylactic purposes, in 
particular genetic immunisation. 

In accordance with yet another aspect of the present invention, there are provided 
5 mmbitors to such polypeptides, useful as antibacterial agents. In particular, there are 
provided antibodies against such polypeptides. 

Another aspect of the invention is a pharmaceutical composition comprising the 
above polypeptide, polynuc.eotide or inhibitor of the invention and a pharmaceutical* 
acceptable carrier. 

10 In a particular aspect the invention provides the use of an inhibitor of the invention 

as an antibacterial agent. 

The invention further relates to the manufacture of a medicament for such uses. 
The polypeptide may be used as an antigen for vaccination of a host to produce 
specific antibodies which have anti-bacterial action. 
1 5 The polypeptides or cells expressing them can be used as an immunogen to produce 

anybodies thereto. These antibodies can be, for example, polyc.onal or monoclonal 
anybodies. The term antibodies also includes chimeric, single chain, and humanized 
anybodies, as well as Fab fragments, or the product of an Fab expression library. Various 
procedures known in the an may be used for the production of such antibodies and 
20 fragments. 

Antibodies generated against the polypeptides of the present invention can be 
obtamed by direct injection of the polypeptides into an animal or by administering the 
polypepttdes to an animal, preferably a nonhuman. The antibody so obtained will then bind 
the polypeptides itself. In this manner, even a sequence encoding only a fragment of the 
polypeptides can be used to generate antibodies binding the whole native polypeptides 
Such antibodies can then be used to isolate the polypeptide from tissue expressing that 
polypeptide. 

Polypeptide derivatives include antigenically or immunologically equivalent 
derivatives which form a particular aspect of this invention. 

The term 'antigenically equivalent derivative' as used herein encompasses a 
polypeptide or its equivalent which win be specifical.y recognised by certain antibodies 
wh,ch. when raised to the protein or polypeptide according to the present invention, 
■nterfere with the interaction between pathogen and mammalian host. 
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The term 'immunologically equivalent derivative' as used herein encompasses a 
peptide or its equivalent which when used in a suitable formulation to raise antibodies in a 
vertebrate, the antibodies act to interfere with the interaction between pathogen and 
mammalian host. 

In particular derivatives which are slightly longer or slightly shorter than the native 
protein or polypeptide fragment of the present invention may be used. In addition, 
polypeptides in which one or more of the amino acid residues are modified may be used. 
Such peptides may, for example, be prepared by substitution, addition, or rearrangement of 
amino acids or by chemical modification thereof All such substitutions and modifications 
are generally well known to those skilled in the art of peptide chemistry. 

The polypeptide, such as an antigenically or immunologically equivalent derivative 
or a fusion protein thereof is used as an antigen to immunize a mouse or other animal such 
as a rat or chicken. The fusion protein may provide stability to the polypeptide. The 
antigen may be associated, for example by conjugation , with an immunogenic carrier 
protein for example bovine serum albumin (BSA) or keyhole limpet haemocyanin (KLH). 
Alternatively a multiple antigenic peptide comprising multiple copies of the the protein or 
polypeptide, or an antigenically or immunologically equivalent polypeptide thereof may be 
sufficiently antigenic to improve immunogenicity so as to obviate the use of a carrier. 

For preparation of monoclonal antibodies, any technique which provides antibodies 
produced by continuous cell line cultures can be used. Examples include the hybridoma 
technique (Kohler and Milstein, 1975, Nature, 256:495-497), the trioma technique, the 
human B-cell hybridoma technique (Kozbor et al., 1983, Immunology Today 4:72), and the 
EBV-hybridoma technique to produce human monoclonal antibodies (Cole, et al., 1985, in 
Monoclonal Antibodies and Cancer Therapy, Alan R. Liss, Inc., pp. 77-96). 

Techniques described for the production of single chain antibodies (U.S. Patent 
4,946,778) can be adapted to produce single chain antibodies to immunogenic polypeptide 
products of this invention. 

Using the procedure of Kohler and Milstein (supra (1975), antibody-containing 
cells from the immunised mammal are fused with myeloma cells to create hybridoma cells 
secreting monoclonal antibodies. 

The hybridomas are screened to select a cell line with high binding affinity and 
favorable cross reaction with other staphylococcal species using one or more of the original 
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polypeptide and/or the fusion protein. The selected cell line is cultured to obtain the 
desired Mab. 

Hybridoma cell lines secreting the monoclonal antibody are another aspect of this 
invention. 

Al«ma«ively ptog, dispb, „, ^ a ^ 
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fusion protein. 

As mentioned above, a fragment of the final antibody may be prepared 

^ eMri ^yn»«y^«mermtactanubodyofM r approxl50,(^ 

«, for example a Fab fragment or a Fv fragment as described in Skerra, A and Pluckthun, A 

(1988) Science 240 1038-1040. If two ami am hinri; n a 

h two antigen binding domains are present each domain 

may be directed against a different epitope - termed 'bispecific' antibodies. 

The antibody of the invention may be prepared by conventional means for example 
by established monoclonal antibody technology (Kohler, G. and Milstein, C. supra ( 1 975)) 
or using recombinant means e.g. combinatorial libraries, for example as described in Huse 
W.D. etal., (1989) Science 246,1275- 1281. 

Preferably the antibody is prepared by expression of a DNA polymer encoding said 
ant,body in an appropriate expression system such as described above for the expression of 
polypeptides of the invention. The choice of vector for the expression system will be 
determined in part by the host, which may be a prokaryotic cell, such as R coli (preferably 
«™.n B) or Strepu^yces sp. or a eukaryotic cel., such as a mouse C127, mouse myeloma 
human HeU Chines luu^ ^ ' 

host may also be a transgenic animal or a transgemc plant [for example as described in 
Hiatt^ et fl /.,( 19 89) Nature 34, 76-78]. Suitable vectors include plasmids, bacteriophages 
cosm,ds and recombinant viruses, derived from, for example, baculoviruses and vaccinia ' 

The Fab fragment may also be prepared from its parent monoclonal antibody by 
enzyme treatment, for example using papain to cleave the Fab portion from the Fc portion 
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Preferably the antibody or derivative thereof is modified to make it less 
immunogenic in the patient. For example, if the patient is human the antibody may most 
preferably be 'humanised'; where the complimentarity determining region(s) of the 
hybridoma-derived antibody has been transplanted into a human monoclonal antibody f for 
example as described in Jones, P. et al (1986), Nature 32 1, 522-525 or Tempest et 
a/.,( 1 99 ! ) Biotechnology 9> 266-273 . 

The modification need not be restricted to one of 'humanisation' ; other primate 
sequences (for example Newman, R. el al . 1992, Biotechnology,10, 1455-1460) may also 
be used. 

The humanised monoclonal antibody, or its fragment having binding activity, form 
a particular aspect of this invention. 

This invention provides a method of screening drugs to identify those which 
interfere with the proteins herein, which method comprises measuring the interference of 
the protein activity by test drug. For example, if the protein has enzymatic activity, after 
suitable purification and formulation the activity of the enzyme can be followed by its 
ability to convert its natural substrates. By incorporating different chemically synthesised 
test compounds or natural products into such an assay of enzymatic activity one is able to 
detect those additives which compete with the natural substrate or otherwise inhibit 
enzymatic activity. 

The invention also relates to inhibitors identified thereby. 

The use of a polynucleotide of the invention in genetic immunisation will 
preferably employ a suitable delivery method-such as direct injection of plasmid DNA into 
muscles (Wolff et a/., Hum Mol Genet 1992, 1 :363, Manthorpe et a/., Hum. Gene Ther. 
1963:4, 419), delivery of DNA complexed with specific protein carriers ( Wu et al., J Biol 
Chem 1989:264,16985), coprecipitation of DNA with calcium phosphate (Benvenisty & 
Reshef, PNAS, 1986:83,9551), encapsulation of DNA in various forms of liposomes 
(Kaneda et al. t Science 1989:243,375), particle bombardment (Tang et aL Nature 1992, 
356: 152, Eisenbraun et aL, DNA Cell Biol 1993. 12:791) and in vivo infection using cloned 
retroviral vectors (Seeger et al, PNAS 1984:81,5849). Suitable promoters for muscle 
transfection include CMV, RSV, SRa, actin, MCK, alpha globin, adenovirus and 
dihydrofolate reductase. 
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In therapy or as a prophylactic, the active agent i.e the polypeptide, polynucleotide 
or inhibitor of the invention, may be administered to a patient as an injectable composition, 
for example as a sterile aqueous dispersion, preferably isotonic. 

Alternatively the composition may be formulated for topical application 
for example in the form of ointments, creams, lotions, eye ointments, eye drops, ear drops, 
mouthwash, impregnated dressings and sutures and aerosols, and may contain appropriate' 
conventional additives, including, for example, preservatives, solvents to assist drug 
penetration, and emollients in ointments and creams. Such topical formulations may also 
contain compatible conventional carriers, for example cream or ointment bases, and ethanol 
or oleyl alcohol for lotions. Such carriers may constitute from about 1% to about 98% by 
weight of the formulation; more usually they will constitute up to about 80% by weight of 
the formulation. 

For administration to human patients, it is expected that the daily dosage level of 
the active agent will be from 0.01 to 10 mg/kg, typically around I mg/kg. The physician in 
any event will determine the actual dosage which will be most suitable for an individual 
patient and will vary with the age, weight and response of the particular patient. The above 
dosages are exemplary of the average case. There can, of course, be individual instances 
where higher or lower dosage ranges are merited, and such are within the scope of this 
invention. 

A vaccine composition is conveniently in injectable form. Conventional adjuvants 
may be employed to enhance the immune response. 

A suitable unit dose for vaccination is 0.5-5ug/kg of antigen, and such dose is 
preferably administered 1-3 times and with an interval of 1-3 weeks. 

Within the indicated dosage range, no adverse toxicoiogicals effects are expected 
with the compounds of the invention which would preclude their administration to suitable 
patients. 

EXAMPLES 

In order to facilitate understanding of the following examples certain frequently 
occurring methods and/or terms will be described. 

"Plasmids" are designated by a lower case p preceded and/or followed by capital 
letters and/or numbers. The starting plasmids herein are either commercially available, 
publicly available on an unrestricted basis, or can be constructed from available plasmids in 
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accord with published procedures. In addition, equivalent plasmids to those described are 
known in the art and will be apparent to the ordinarily skilled artisan. 

"Digestion" of DNA refers to catalytic cleavage of the DNA with a restriction 
enzyme that acts only at certain sequences in the DNA. The various restriction enzymes 
5 used herein are commercially available and their reaction conditions, cefaclors and other 
requirements were used as would be known to the ordinarily skilled artisan. For analytical 
purposes, typically I ug of plasmid or DNA fragment is used with about 2 units of enzyme 
in about 20 ul of buffer solution. For the purpose of isolating DNA fragments for plasmid 
construction, typically 5 to 50 ug of DNA are digested with 20 to 250 units of enzyme in a 
1 0 larger volume. Appropriate buffers and substrate amounts for particular restriction 

enzymes are specified by the manufacturer. Incubation times of about i hour at 37 C are 
ordinarily used, but may vary in accordance with the supplier's instructions. After digestion 
the reaction is electrophoresed directly on a polyacrylamide gel to isolate the desired 
fragment. 

15 Size separation of the cleaved fragments is performed using 8 percent 

polyacrylamide gel described by Goeddel, D. et ai % (1980) Nucleic Acids Res., 8:4057. 

"Oligonucleotides' 1 refers to either a single stranded polydeoxynucleotide or two 
complementary polydeoxynucleotide strands which may be chemically synthesized. Such 
synthetic oligonucleotides have no 5' phosphate and thus will not ligate to another 

20 oligonucleotide without adding a phosphate with an ATP in the presence of a kinase. A 
synthetic oligonucleotide will ligate to a fragment that has not been dephosphorylated. 

"Ligation" refers to the process of forming phosphodiester bonds between two 
double stranded nucleic acid fragments (Maniatis, T M et al., Id., p. 146). Unless otherwise 
provided, ligation may be accomplished using known buffers and conditions with 10 units 

25 to T4 DNA ligase ("ligasc") per 0.5 ug of approximately equimolar amounts of the DNA 
fragments to be ligated. 
Example 1 

Isolation of DNA from 5. Aureus WCUH 29 

The polynucleotide having the DNA sequence given in SEQ ID NO I was obtained 
30 from a library of clones of chromosomal DNA of S.aureus WCUH 29 in Rcoli. In some 
cases the sequencing data from two or more clones containing overlapping S.aureus 
WCUH 29 DNA was used to construct the contiguous DNA sequence in Sequences set 
forth in SEQUENCE I [SEQ ID Nos: 1 ,4,7, 1 0, 1 3, 1 6, 1 9,22,25,28,3 1 ,34,37,40,43,46, 
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49,52.55,58,61,64,67,70,73,76] of Table 1. Libraries may be prepared by routine 
methods, for example: 
Methods 1 and 2 

Total cellular DNA is isolated from Staphylococcus aureus strain WCUH29 
(NCIMB 40771) according to standard procedures and size- fractionated by either of two 
methods. 
Method I. 

Total cellular DNA is mechanically sheared by passage through a needle in order to 
size-fractionate according to standard procedures. DNA fragments of up to I Ikbp in size 
are rendered blunt by treatment with exonuclease and DNA polymerase, and EcoRI linkers 
added. Fragments are ligated into the vector Lambda ZaplI that has been cut with EcoRI, 
the library packaged by standard procedures and Ecoli infected with the packaged library.' 
The library is amplified by standard procedures. 
Method 2. 



Total cellular DNA is partially hydrolysed with a combination of four restriction 
enzymes (Rsal, Pall, Alul and Bshl235I) and size-fractionated according to standard 
procedures. EcoRI linkers are ligated to the DNA and the fragments then ligated into the 
vector Lambda ZaplI that have been cut with EcoRI, the library packaged by standard 
procedures, and Exoli infected with the packaged library. The library is amplified by 
20 standard procedures. 
Example 2 

The determination of expression during infection of a gene from Staphylococcus 
aureus WCUH29 

Necrotic fatty tissue from a four day groin infection of Staphylococcus aureus 
WCUH29 in the mouse is efficiently disrupted and processed in the presence of chaotropic 
agents and RNAase inhibitor to provide a mixture of animal and bacterial RNA. The 
optimal conditions for disruption and processing to give stable preparations and high yields 
of bacterial RNA are foHowed by the use of hybridisation to a radiolabeled oligonucleotide 
specific to Staphylococcus aureus 1 6S RNA on Northern blots. The RNAase free, DNAase 
free, DNA and protein free preparations of RNA obtained are suitable for Reverse 
Transcription PCR (RT-PCR) using unique primer pairs designed from the sequence of 
each gene of Staphylococcus aureus WCUH29. 
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a) Isolation of tissue infected with Staphylococcus aureus WCUH29 from a mouse 
animal model of infection 

10 ml. volumes of sterile nutrient broth (No.2 Oxoid) are seeded with isolated, 
individual colonies of Staphylococcus aureus WCUH29 from an agar culture plate. The 
cultures are incubated aerobically (static culture) at 37 degrees C for 16-20 hours . 4 week 
old mice (female,! 8g-22g, strain MF1) are each infected by subcutaneous injection of 
0.5ml. of this broth culture of Staphylococcus aureus WCUH29 (diluted in broth to 
approximately 1 0 8 cfu/ml.) into the anterior , right lower quadrant (groin area). Mice should 
be monitored regularly during the first 24 hours after infection, then daily until termination 
of study. Animals with signs of systemic infection, i.e. lethargy, ruffled appearance, 
isolation from group, should be monitored closely and if signs progress to moribundancy, 
the animal should be culled immediately. 

Visible external signs of lesion development will be seen 24-48h after infection. 
Examination of the abdomen of the animal will show the raised outline of the abscess 
beneath the skin. The localised lesion should remain in the right lower quadrant, but may 
occasionally spread to the left lower quadrant, and superiorly to the thorax. On occasions, 
the abscess may rupture through the overlying skin layers. In such cases the affected 
animal should be culled immediately and the tissues sampled if possible. Failure to cull the 
animal may result in the necrotic skin tissue overlying the abscess being sloughed off, 
exposing the abdominal muscle wall. 

Approximately 96h after infection, animals are killed using carbon dioxide 
asphyxiation. To minimise delay between death and tissue processing /storage, mice should 
be killed individually rather than in groups.The dead animal is placed onto its back and the 
fur swabbed liberally with 70% alcohol. An initial incision using scissors is made through 
the skin of the abdominal left lower quadrant, travelling superiorly up to, then across the 
thorax. The incision is completed by cutting inferiorly to the abdominal lower right 
quadrant. Care should be taken not to penetrate the abdominal wall. Holding the skin flap 
with forceps, the skin is gently pulled way from the abdomen. The exposed abscess, which 
covers the peritoneal wall but generally does not penetrate the muscle sheet completely, is 
excised, taking care not to puncture the viscera 

The abscess/muscle sheet and other infected tissue may require cutting in sections, 
prior to flash-freezing in liquid nitrogen, thereby allowing easier storage in plastic 
collecting vials. 
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b) Isolation of Staphylococcus aureus WCUH29 RNA from infected tissue samples 

4-6 infected tissue samp!es(each approx O.5-0.7g) in 2ml screw-cap tubes are 
removed from -80°C.storage into a dry ice ethanol bath In a microbiological safety cabinet 
the samples are disrupted individually whilst the remaining samples are kept cold in the dry 
ice ethanol bath. To disrupt the bacteria within the tissue sample 1ml of TRJzol Reagent 
(Gibco BRL, Life Technologies) is added followed by enough 0.1mm zirconia/silica beads 
to almost fill the tube, the lid is replaced taking care not to get any beads into the screw 
thread so as to ensure a good seal and eliminate aerosol generation. The sample is then 
homogenised in a Mini-BeadBeater Type BX-4 (Biospec Products). Necrotic fatty tissue is 
treated for 100 seconds at 5000 rpm in order to achieve bacterial lysis. In vivo grown 
bacteria require longer treatment than in vitro grown S.aureus WCUH29 which are 
disrupted by a 30 second bead-beat. 

After bead-beating the tubes are chilled on ice before opening in a fume-hood as 
heat generated during disruption may degrade the TRJzol and release cyanide. 
15 200 m icrolitres of chloroform is then added and the tubes shaken by hand for 1 5 

seconds to ensure complete mixing. After 2-3 minutes at room temperature the tubes are 
spun down at 12,000 x g, 4°C for 15minutes and RNA extraction is then continued 
according to the method given by the manufacturers of TRIzol Reagent i.e.:- The aqueous 
phase, approx 0.6 ml, is transferred to a sterile eppendorf tube and 0.5 ml of isopropanol is 
20 added. After 1 0 minutes at room temperature the samples are spun at 1 2,000 x g, 4 °C for 
10 minutes. The supernatant is removed and discarded then the RNA pellet is washed with 
I ml 75% ethanol. A brief vortex is used to mix the sample before centrifuging at 7,500 x g, 
4 °C for 5 minutes. The ethanol is removed and the RNA pellet dried under vacuum for no 
more than 5 minutes. Samples are then resuspended by repeated pipetting in 100 
microlitres of DEPC treated water, followed by 5-10 minutes at 55 °C. Finally, after at 
least 1 minute on ice, 200 units of Rnasin (Promega) is added. 

RNA preparations are stored at -80 °C for up to one month. For longer term storage 
the RNA precipitate can be stored at the wash stage of the protocol in 75% ethanol for at 
least one year at -20 °C. 

Quality of the RNA isolated is assessed by running samples on 1% agarose gels. I 
x TBE gels stained with ethidium bromide are used to visualise total RNA yields. To 
demonstrate the isolation of bacterial RNA from the infected tissue 1 x MOPS, 2.2M 
formaldehyde gels are run and vacuum blotted to Hybond-N (Amersham). The blot is then 
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hybridised with a 32 P labelled oligonucletide probe specific to 16s rRNA of S.aureus ( 
K.Greisen, M. Loeffelholz, A. Purohit and D. Leong. J.Clin. (1994) Microbiol. 32 335-35 1 
). An oligonucleotide of the sequence:- 

5'-gctcctaaaaggttactccaccggc-3' [SEQ ID NO:9l] 
5 is used as a probe. The size of the hybridising band is compared to that of control RNA 
isolated from in vitro grown S.aureus WCUH29 in the Northern blot Correct sized 
bacterial 16s rRNA bands can be detected in total RNA samples which show extensive 
degradation of the mammalian RNA when visualised on TBE gels. 

c) The removal of DNA from Staphylococcus aureus WCUH29 derived RNA 

1 0 DNA was removed from 73 microlitre samples of RNA by a 1 5 minute treatment 

on ice with 3 units of DNAasel, amplification grade (Gibco BRL, Life Technologies) in the 
buffer supplied with the addition of 200 units of Rnasin (Promega) in a final volume of 90 
microlitres. 

The DNAase was inactivated and removed by treatment with TRIzol LS Reagent 
1 5 (Gibco BRL, Life Technologies) according to the manufacturers protocol. 

DNAase treated RNA was resuspended in 73 microlitres of DEPC treated water with the 
addition of Rnasin as described in Method L 

d) The preparation of cDNA from RNA samples derived from infected tissue 

10 microlitre samples of DNAase treated RNA are reverse transcribed using.a 
20 Superscript Preamplification System for First Strand cDNA Synthesis kit (Gibco BRL, Life 
Technologies) according to the manufacturers instructions. 1 nanogram of random 
hexamers is used to prime each reaction. Controls without the addition of SuperScriptll 
reverse transcriptase are also run. Both +/-RT samples are treated with RNaseH before 
proceeding to the PCR reaction 
25 e) The use of PCR to determine the presence of a bacterial cDNA species 
PCR reactions are set up on ice in 0.2ml tubes by adding the following 
components: 

45 microlitres PCR SUPERMIX (Gibco BRL, Life Technologies). 

1 microlitre 50mM MgCI 2 , to adjust final concentration to 2.5mM. 
30 I microlitre PCR primers(optimally 18-25 basepairs in length and 

designed to possess similar annealing temperatures), each primer at lOmM 
initial concentration. 

2 microlitres cDNA. 
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PCR reactions are run on a Perkin Elmer GeneAmp PCR System 9600 as follows: 
5 minutes at 95 °C, then 50 cycles of 30 seconds each at 94 °C, 42 °C and 
72 °C followed by 3 minutes at 72 °C and then a hold temperature 
of 4 °C. (the number of cycles is optimally 30-50 to determine the appearance or lack of a 
PCR product and optimally 8-30 cycles if an estimation of the starting quantity of cDNA 
from the RT reaction is to be made). 

1 0 microlitre aliquots are then run out on 1% 1 x TBE gels stained with ethidium 
bromide with PCR product, if present, sizes estimated by comparison to a 100 bp DNA 
Ladder (Gibco BRL, Life Technologies). Alternatively if the PCR products are 
conveniently labelled by the use of a labelled PCR primer (e.g. labelled at the 5"end with a 
dye) a suitable aliquot of the PCR product is run out on a polyacrylamide sequencing gel 
and itspresence and quantity detected using a suitable gel scanning system (e.g. ABI 
Prism™ 377 Sequencer using GeneScan™ software as supplied by Perkin Elmer) 
RT/PCR controls may include +/- reverse transcriptase reactions, 16s rRNA 
primers or DNA specific primer pairs designed to produce PCR products from non- 
transcribed S.aureus WCUH29 genomic sequences. 

To test the efficiency of the primer pairs they are used in DNA PCR with WCUH29 
total DNA. PCR reactions are set up and run as described above using approx. 1 microgram 
of DNA in place of the cDNA and 35 cycles of PCR. 

Primer pairs which fail to give the predicted sized product in either DNA PCR or 
RT/PCR are PCR failures and as such are uninformative. Of those which give the correct 
size product with DNA PCR two classes are distinguished in RT/PCR: 

I -Genes which are not transcribed in vivo reproducibly fail to give a product in 
RT/PCR. 

2.Genes which are transcribed in vivo reproducibly give the correct size product in 
RT/PCR and show a stronger signal in the +RT samples than the signal (if at all 
present) in -RT controls. 

The following nucleotide sequences (sequences set forth in SEQUENCE I [SEQ ID 

Nos:l,4,7,10.13,16,19^5,28,3I,34,37 > 40.43,46,49,52,55,58,61,64,67,70,73,76]of 
Table I ) were identified in the above test as transcribed in vivo. Each set of sequences 
relates to a separate gene (Gene #). Deduced amino acid sequences are given where 
available as the sequences set forth in each SEQUENCE 2 [SEQ ID Nos: 
79,80,81,82,83,84,85,86,87,88,89,90] of Table .. The pairof PCR primers used to 
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identify the gene are given as the sequences set forth in SEQUENCE 3 [SEQ ID Nos: 
2,5,8,11,14, 

17,20,23,26,29,32,35,38,41,44,47,50,53,56,59,62,65,68,71,74,77] and 4 [SEQ ID 
Nos: 3,6,9, 12,15,1 8,2 1 ,24,27,30,33,36,39,42,45,48,5 1 ,54,57,60.63,66,69,72,75,78] 
of Table 1 . Homologies to known genes are given where determined and represent the 
putative identification of gene function for each gene in Table 1 . 

TABLE 1 

Gene #1 

E.coli pts system 5' end ptfB 
SEQUENCE 1 I SEQ ID NO:l] 



1 

X 




TSTTTPCTTP 


ATGATTGCCT 


AATTCAATCA 


CATCTTTACT 


O JL 




Tric~&. a RTrTir 
ibLAAAlLAt 


GCAATTGACC 


ATNTGGATCT 


CGTCTATCAT 


1 01 


AGTCATAAAT 


ACGGTATGTC 


GTATCGGATG 


ATTGTTGTGT 


CTCTAAAATT 


i. J J. 


AAAATACCCG 


AACCAATGGC 


AIGGACAGTG 


CCAGCAGGAA 


CATAATAAAA 


201 

C \J X 


GTCACCGGGC 


TTAACAGGTA 


intb ill GAA 


AAGACTGCCA 


7\ Timm^*m^»i\m 

AAT t cat gat 


251 


TATCAATCAT 


GTCGATTAAC 


Gv_^XGlTTAT 


TATGTGLATG 


GACGCCATAA 


301 


TATAATTTCA 


GCACCTGGGC 


■1 wV^ri lui rvrVrV 




llulbi 1 1 in 


351 


CCTAGTTCGC 


CTTCGTGTTT 


TAAAGCGTAG 


TCATCATCTG 


GATGAACTTG 


401 


AACAGATAAT 


TTATCATTGG 


CATCTAATAC 


TTTAGTTAGC 


AGAGGGAAAC 


451 


TATCTCGTGA 


AT CAT TAT CG 


AATAATTCAC 


GATGTTGTGA 


CCAAAGTTGA 


501 


TCTAGGGTCA 


TATCCTTGTA 


TGGACCATTG 


ATAATTGTAT 


TAGGACCATT 


551 


TGGATGTGCA 


GAAATTGCCC 


AGCATTCACC 


AGTTGTTTCA 


TTAGGGATAT 


601 


CATAGTTAAA 


TGCTTTTAAT 


GCATGACCGC 


CCCAAATTCT 


GTCTTTAAAA 


651 


ACGGGTTGTA 


AAAATAATGC 


CATAGTTAAA 


ACTCCTCTAT 


ATTTTCATTA 


701 


ATAAGTTATA AATTTCTGTA 


GTACTGTTGG 


CATTAATTAG 


TGATTGGCGT 


751 


GTCTCATCAT 


TCATTAACGC 


TTTAGATAAG 


CGCTGAAGTA 


TTTTTAAATG 


801 


TGTATCCTGA 


CTGTTGTTTG 


GTACGGCAAT 


TAAGAATATC 


AATTGAGGTA 


851 


GACTACCATC 


TAGACTGTCC 


CATTTAACAC 


CATGATTATT 


TTTCATAACA 


901 


GCTACAATCG 


GTTGTTTTAC 


AACATCAGAC 


TTTGCATGTG 


GAATGGCCAC 


951 


GTTCATGCCA 


ATAGCTGTCG 


TAGACTCCAT 


TTCACGTTCT 


AGTATTGCAT 
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1001 TTTTTAAATG CGATGTGTGC TCTACATAAC GGCAAATTTT AAGTTTATGA 
1051 ATCAACATAT CAATTGCTTC GTTTCGAGAC ATGTCGTGAT CAGTAATTAT 
1101 CATAGTTTGT TGATCAAAAA CATGAGAAGG TTTATTGAGA TGTGAATGTT 
1151 TCGCTCGTGC CATCNACATT GTCAACCTCT GTATCATGTT GTGTAATATC 
1201 TGTATCATGA AGTTGCGTGT GTTGCGCTGG TGCATCTACT GCTATAACTG 
1251 GTGTATTGCG TNTTAATAAT AGTACAGTAG GCATTGTGAC AAGACTACCT 
^ 1301 ACTATCNCTC CAAAGATAAA C C AT AATAC A TGATCAATAC CACCTAATAC 

1351 AGCCACGATT GGACCTCCAT GTGCGACTCT ATCGCCGACA CCACCAATGN 
1401 CTGCAATGAC TGATGCAATC ATTGCACCAA TGATGTTTGC AGGTATAATG 
1451 CGCAATGGAT CTTGGGCTGC GAAAGGAATA GCACCTTCAG TAATNCCAAA 
1501 TAGTCCCATA GTGAAGGNAG CCTTACCCAT TTCTCTTTCG GAATGATTGA 
25 1551 ATTTATACTT NTGAACANAC GTTGCTAAAC CTAAACCGAT TGGTGGTGTA 

1601 CATACANCAA CTGCGACCAT ACCCATAACG GCGTAATTAC CTTCAGCAAT 
1651 AAGTGCTGAG CCAAATAAAA ATGCTACCTT GTTTAATTGG ACCGCCCATA 
1701 TCGAAGGCGA TCATCGCACC TATAATCATC GACAAGTATA ATAATATTAG 
1751 CACCTTGCAT ACTTTTTAAC CAGGGTTGTT AGGAATGCCG CAAAAATATT 
1801 AGAAATCGTG CACCGATTAA AAATATAAAT ATCAATCCTA ACAACGACCG 
1851 ATGAAATAAT GGGAATAATA ATGATAGGCA TAATTGGTGC CATTGCTTTT 
1901 GGAACTTTAA TATCTTTAAT CCACTTTGCG ATATAACCTG CTAAGAAACC 
1951 AGCAACAATA CCACCTAAAA ATCCTGCGCC TGCATCACTG CCATAAAAAC 
2001 TACCGTCAGC AGCGATAGCG CCGCCAATCA TACCAGGAAC AAGACCGGGC 
2051 TTGTCAGCGA TACTAACAGC GATATATCCA GCTCGTGCCG AATTCGGCAC 
2101 GAGCTCGTGC C 

SEQUENCE 2 (STOPS SHORT) (SEQ ID NO- 79) 

1 MGMVAVXVCT PPIGLGLATX VXKYKFNHSE REMGKAXFTM GLFGITEGAI 

51 PFAAQDPLRI IPANIIGAMI ASVIAXIGGV GDRVAHGGPI VAVLGGIDHV 
101 LWFIFGXIVG SLVTMPTVLL LXRNTPVIAV DAPAQHTQLH DTDITQHDTE 
151 VDNVDGTSET FTSQ* 
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SEQUENCE 3 [SEQ ID NO: 2) 
accctctgta tcatgttg 

5 SEQUENCE 4 (SEQ ID NO: 3] 
gtgcgatgat cgccttgg 
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Gene #2 
E.coli RelA 

SEQUENCE 1 {SEQ ID NO: 4] 



1 


CGGCTCTTCG 


TAATATTGAT 


AATGTGCAAT 


ATTTNAAGAA 


TAATCAATTT 


51 


ATTGAAGAAG 


AAACCGTAGT 


GACCGTGAGC 


GAATATCGAA 


NCGGCTATTG 


101 


ATAGAATACG 


TACTGAAATG 


GACCCGAATG 


AATATCGAAG 


NCGATATAAA 


151 


TGGTAGACCT 


AAACATATTT 


ACAGTATTTA 


TCGGNAAATG 


ATGAAGCAGA 


201 


AAAAACAATT 


TGATCAAATT 


TTTGATTTGT 


TGGCGATACG 


TGTTATTGTC 


251 


AATTCTATTA 


ATGATTGTTA 


TGCGATACTT 


GGGTTGGTGC 


ATACGTTATG 


301 


GAAACCGATG 


CCAGGACGTT 


TTAAAGATTA 


TAXTGCAATG 


CCTAAACAAA 


351 


ATTTGTATCA 


GTCATTGCAT 


ACTACAGTAG 


TAGGTCCAAA 


TGGAGACCCG 


401 


CTCGAAATCC 


AAATACGAAC 


GTTTGATATG 


CACGAAATTG 


CTGAGCATGG 


451 


TGTTGCAGCA 


CACTGGGCTT 


ACAAAGAAGG 


TAAAAAAGTA 


AGTGAAAAAG 


501 


ATCAAACTTA 


TCAAAATAAG 


TTAAATTGGT 


TAAAAGAATT 


AGCTGAAGCG 


551 


GATCATACAT 


CGTCTGACGC 


TCAAGAATTT 


ATGGAAACCT 


TATAATATGA 


601 


CTTACAGAGT 


GACAAAGTAT 


ACGCATTTAC 


CCCAGGGAGT 


GATGTTATTG 


651 


AGTNGGCATA 


TGGTGCTGTG 


CCGATTGGAT 


TTTGGCTTAT 


GCGAATCACA 


701 


GGGAANGTAG 


GTAATAAGAT 


GATTGGCGCC 


CAGGTGGAAT 


GGCAAAATTG 


751 


TACCANATTG 


ACTTATNTTT 


TCACAAAACA 


GGCGGATATT 


GTTGGAAATA 


801 


CCGTTCTAG 










SEQUENCE 

1 


2 (SEQ ID NO:80] 
MNIEXDINGR PKHIYSIYRX 


MMKQKKQFDQ 


IFDLLAIRVI 


VNSINDCYAI 


51 


LGLVHTLWKP MPGRFKDYIA MPKQNLYQSL 


HTTWGPNGD 


PLEIQIRTFD 


101 


MHEIAEHGVA 


AHWAYKEGKK 


VSEKDQTYQN 


KLNWLKELAE 


ADHTSSDAQE 


151 


FMETL* 
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SEQUENCE 3 (SEQ ID. NO: 5) 
agatacgtac tgaaatgg 

SEQUENCE 4 [SEQ ID NO: 6] 
5 cctgtgattc gcataagc 



Gene #3 
Staph FeniB 



10 SEQUENCE 1 (SEQ ID NO: 7] 

1 GTGATGTGGC TAAACGCTTA AATGCAAATA TATATGTGTC TGGCGAAGGT 

51 GAAGATGCAT TAGGGTATAA AAATATGCCA TCAAAAACAC AATTTGTTAA 

15 101 ACATGGAGAT ATCATTCAAG TAGGCAATGT TAAATTAGAA GTTCTGCATA 

151 CTCCAGGACA CACGCCTGAA AGTATTAGCT TTTTACTCAC TGATTTAGGT 

2Q 201 GGTGGNTCAN GTGTTCCGAT GGGATTATTT AGTGGTGACT TTATTTNTGN 

251 TGGTGATATA GGTAGACCTG ATTTATTAGA AAAATCTTGT TCAAATAAAG 

301 GGTTCGGCAC GAAATTAGCG CGAAACAAAT GTATGAGTCC GATCAAAATA 

25 351 TTAAAAATTT ACCAGACTAT GTTCAAATCT GGCCGGGTCA TGGTGCTGGA 

401 AGCCCTTGTG GTAAAGCATT AGGTGCCATA CCTATATCTA CAATAGGTTA 

3Q 451 TGAGAAAATT AATAACTGGG CATTTAATGA AATTGATGAG ACTAAATTTA 

501 TTGNNTCATT AACATCAAAT CAACCAGCAC CACCNCATCA TTGTGCACAA 

551 ATGAAACAAG TTANTCAGTG TGGCATGAAT TTATNTCAAT CATATGATGT 

35 601 TTATCCNAGC TTAGATNATA AGAGAGTAGC ATTTGATCTT CGCGTAGCAA 

651 AGAGGGCTTT CACGGGTGGC CACACAAAAG GAACAATCAA TATACCATAC 

701 AACAAAAACT TTATTANTCA ANTTGGGTGG GTACTTAGAT TNTGAAAAAG 

751 ATATAGATTT AATTGGAGAT AAATCTACTG TTGAGAAAAG CGAAACACAC 

801 TTTACAATTA ATTGGGTTTG ATAAGGTAGC AGGCTATCGT NTGCCAAAAT 

851 CAGGCATTTC ACCCCAGTCC GNTCATAGCG CTGATATGAC AGGTAAAGAA 

901 GAACATGTAT TAGACGTACG TAATGATGAA GAGTGGAATA ATGGACACTT 

951 AGNTCAAGCA GTTAATATTC CACATGGTAA ATTATTAAAT GAAAATATTC 

1001 CTTTTAATAA AGAGGATAAA ATATATGTAC ATTGTCAGTC AGGTGTTAGA 

1051 AGNTCAATTG CAGTGGGGTA TATTGGGAAA GCAAAGGCTT 
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SEQUENCE 2 [SEQ ID NO:«l] 

1 DVAKRLNANI YVSGEGEDAL 

51 PGHTPESISF LLTDLGGGSX 

101 FGTKLARNKC MSPIKILKIY 

SEQUENCE 3 [SEQ ID NO: 8] 
ttcgggtgtt ttaccttc 

SEQUENCE 4 [SEQ ID NO:9J 
tgcagcaagc cttttctc 

Gene #4 

DiCitrate Binding Protein 

SEQUENCE 1 [SEQ ID NO: 10] 

1 AGCAGAATCT TTTTTAGCAT 

51 TTAAATCAGC TAAATGTTCA 

101 TATTTACTTA AACCTTGTGT 

151 TAGNTCGTTT AAAAATTGTC 

201 AACCAGCTTT AGCAACTACT 

251 TTAATTTCAT CTTTATACTT 

301 TTTNNCGCCT TCTTTNTCTT 

351 GAATTAATAT TGTGGGTGTA 

401 GGTGCAATGT GGGCTAATTC 

451 AGNGATAATT AATCCCGGNT 

501 GTTACGTGTA CCTACAGAAG 

551 ATGATACGTT TTTTCTTACC 

601 CTATATGCTG NTAATGCAAC 

651 CGTTGTGCAT CTTTAGGTAC 

701 AAATAGTATC TTTAGTTGAT 

751 ACCACAAGCT GCAACTAAAA 

801 AAGTTTTTAG ACCTCTCATC 

851 TTATTATTTT ATTGATAACA 

901 TTTATATTTC TAAAATGTAT 



GYKNMPSKTQ FVKHGDIIQV GNVKLEVLHT 
VPMGLFSGDF IXXGDIGRPD LLEKSCSNKG 
QTMFKSGRVM VLEALWKH* 



GATCTGTCAT AATGATCATA CGCTCTGGAT 
GTGTCTAATT GTAAGTAAGG TCCTTTCAAA 
TACATCGTCA CTTAATGCAT TTTTAAATCC 
CAACATATGA ATAGTGTGGA TGTGCTAATA 
GCTGGAAGCA CTTTGTGATT TCTATCAAAT 
ATTGATTAAT TTATCATGCT CAGCAAGACG 
TATTTAAAGC TTTAGCAATT GTTGTTGAAC 
GTCTCCATCA AAACTCTTTA ATGATAATGT 
TTTATTAATA CCCTTATGTC TACTGCTATC 
TTAATTTACT AATNTCTCTT AAGTTNGCTT 
TATTACCCCC AATTTTTCTC TTACTGGGTT 
ATCATCAGCA ATACCAACTT GGTNTAACGG 
CTTGCAAATG AGTACTCTAA TACAACGATA 
TTTTACTGTA CCATTTTCAT CTTTTACCCG 
GATTCTTCTT TTACTTGAAT TATCCGTATT 
GTAAGGCAAC TATTAATCCC AATATACTAA 
NGTCCCACTC CTTAATATGT ATANCTTCAT 
ATTATCATTG TCAAGTAGCG TTCAATCTTT 
GACTATATAT TTCCTCTAAT AATTATGACT 
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951 ACAATTAGCA CATTTCCTTA GACAAAATAC TGATAATGTA TCATTGCTAT 
1001 ATCATCTTTG CATTAATACA ATTGACACCA CTTAGCATGA CCGNTATCCC 
1051 TGTAATTCAG CTGATATTAT CTGTTGCAAT TTTATGTGAC GAACTGTTGC 
1101 ACTTAATTTG ATAANTCAAC AANTACAANA NATCTAAGTT GAACAATTAT 
1151 GATACAACCG TGCAAACGAT ATGTAGTATA ACTTGTCAAC TTAGAATTAT 
1201 TGATAAATAT ATTAATATTG GTTTACCATA GCAGGAGATT TCACATCAAA 
1251 ATTTTGAAGT AGCGTATCAA TCTTTGAATC ATCAATATAT ACCTTATGTA 
1301 AATTTTTCAT ATACATCGAA TGAGAAAGTG CTTCATAATT TAATGAAAAA 
1351 GATATATGAT CTCCAACTTG ATAGTGTCCT TGACCATTTA AATCAAGCAT 
1401 TAAATGATCA CTCGAAGCGC CTAAAATATT GATATGCTGA TCCATAGGTG 
1451 AAATATTATC GACTTGTGTA TCTNAAATAA CCAATATCTA CAATAGCTTG 
1501 TAAGAATGAT TCATGCGTGT GTGTATTAAC TCGAGGTTTA ATTTCTAAAA 
1551 TCTCAGCCTC CAATGTAATC GCATCTTGAT ATAACATAGC GAATCGCTTG 
1601 ATTTGCGTTG TTTCAACAAC TCTAAACAAC GTNTCANCTA TTCGGAANTC 
1651 AATTTATTTT TACCCAAATC AATATATAAA AGGTGGGGGG NAACATGCTC 
1701 CGAATTACCA CCCGGAAATA ATTTNCANTC GATATCCTAT TTCTCTTNCA 
1751 ACAGCTGAGA CGAATCGATT AATCATAAAG ATATCANCAC CACTTGGCGC 
1801 ATCAGATTTA AAACACATAA AATTGAATGC TAAACCTACA AAATGGATAT 
1851 TTTNCAAGTG AATAATCTCT TTANTATAAT CTAAAACATC ATAAGTCAGA 
1901 ACACCTTCAC GGACATCTTT CCAATCTACC ATTAATAAAA TCTTATGTTT 
1951 TTTTCCTAAA ACTTCTGCTA CTTCATTTAT NTGATGTATG GTAGATAATT 
2001 CTGTGTGGAT ACTCATATCA ACTTTCCTCT ATCATATCTG AAATCTCTTT 
2051 TGNGGGAGGC GTACGCAATA ACGTATATGT TAAATCCTGA TCTGCAATAC 
2101 TAATTATGTT ATCCAATCTG GATTCTGCAA CATGATTGAT ACCTAACGCT 
2151 TTTAAGCTTN CTACAATGGT ACGGGCANCA GCTATACACT TAATTACTGG 
2201 TGTGANTNGN ATATTTTTAC TTTGAAAACT NNGTGGAGGT ACTTGGG 

SEQUENCE 3 [SEQ ID NO: 11) 
tgtaagtaag gtcctttc 
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SEQUENCE 4 [SEQ ID NO: 12] 
taatacttct gtaggtac 

Gene #5 

Staph enterotoxin etxA 



SEQUENCE 
1 


1 [SEQ ID NO: 13] 
GGCACGAGCG GCACGAGCGT 


51 


AACGTCCGAT 


TCAGCAAGTT 


101 


CTAAAAAGAT 


TGCTGAAACT 


151 


CAAAACGGTG 


TGAAAGTTGT 


201 


ATCTGATAAA 


TCAGTTGATA 


251 


GCATTACGCT 


CATAGCTTTC 


301 


TCAGAAAATA 


AAAAAAGAGC 


351 


TTGAAATACT 


AGCAAACGAA 


401 


GAACATAAGC 


AACATCATAA 


451 


AGATAAGCAA 


GATAAAGTTG 


501 


TAAACAGAAT 


TGAAGAAAGA 


551 


AATGAAGAAA 


AGACACTCGC 


601 


GGCGCTTGTA 


ACAATTGTTA 


651 


CGTTATTACC 


TTAAGGGAGG 


701 


CATGTTATTT 


GTNTGGTCGT 


751 


CGGCACTGGC 


TTTTTATTTT 


801 


ACAGCTCACA 


AGACAGGAAG 


851 


TAATACCAAG 


TAAGTAGGAT 


901 


ATCTTTTTAT 


TATAGACACA 


951 


ATAATTAAAT 


GATAATCATT 


1001 


CTTAACANAA 


ATAATTATGC 


1051 


CATTTACATT 


ACTTTTATTC 


1101 


TGTAAATGGT 


AGCGAGAAAA 



GTTGTATCAA GATTTTGTAG GCAGTTTTAC 
ATGCACAAGA TTTTAAATCT GAGGAAAACG 
TTAAATCTTT TATATCAATT AACAGGCAAT 
GAAAGAAGTT GTGGATAGAA CTGACTTGTC 
GCGAAACAAT GTAACTATAC TAAGTTATGA 
TTAGAAAGTA GGTGTAGTTT TGGATGATAT 
TTTCTGAATT AGTTGAACGT GTTGATGATG 
ACAGCTGATC ATGTGCTTGA ACTTAGAGAG 
TGAACTAAGA GAATCTCATA AAGAACTTAA 
TAGATGAGAA TTTAGAGCAA ACAAAGATAT 
TATCANACGC AAGTAGNTGT TGNGCAAAAA 
CCAAAATAAA TGGCTCGTAG GTGCCATATG 
TGATTGCAGT CATTACTGCA TCAATTNCTG 
TGGACATAAT GAGTTGGGCA AGATGGTTAT 
AAATGTAAAT AATGTTTTTG GTCAGTGCAT 
GATTGAAAAG AGGTACGTAC ATGGTATTAC 
CATACTCCAA GTGAAGTTGG GAAGTGTTGT 
ATCTGANATG TATAATAGAG TAAAAATGAA 
TATAAAAAGT GTATAGTAAT ATATGTATGT 
TCATAATTAT TGTATATAAC TAAATAACTA 
TTTAGAGNTG ACCANNATGA NNNANNCCAG 
ATTGCCCTNA CGTTGACNAC AAGTCCCANT 
GCGNAGNAAT AAATGCGAAA GATTTGCGAA 
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1151 


AAAAGTCTGA 


ATTCCAGGGN 


ACAGCTTTAG 


NCAATCTTAN 


NCANATCTAT 




1201 


TATTACNATG 


NNANAGCTAN 


AACTGAAAAT 


AAAGAGAGTC 


CNCGACCACA 


5 


1251 


TTTTTACAGC 


ATACTATATT 


GTTTANAGGC 


TTTTTTACAG ATCATTCGTG 




1301 


GTATANCGAT 


TTATTAGTAG 


ATTNTGATTC 


NNAGGATATT 


GTTNATAAAA 


10 


1351 


ATAAAGGGNA 


AANAGTAGAC 


TTGTATGGTG 


CTTATTATGG 


TTATCAATGT 


1401 


GCGGGTGGTA 


CACCACACAA 


AACAGCTTGT 


ATGTATGGTG 


GTGTAACGTT 




1451 


ACATGATAAT 


AATCGATTGA 


CCGAAGAGAA 


AAAAGTGCCG 


ATCAATTTAT 


15 


1501 


GGCTAGACGG 


TAAACANAAT 


ACAGTACCTT 


TGGAAACGGT 


TAAAACGAAT 




1551 


AAGAAAAATG 


TAACTGTTCA 


GGAGTTGGAT 


CTTCAAGCAA 


GACGTTATTT 


20 


1601 


ACAGGAAAAA 


TATAATTTAT 


ATAACTCTGA 


TGTTTTTGAT 


GGGAAGGTTC 


1651 


AGAGGGGATT 


AATCGTGTTT 


CATACTTCTA 


CAGAACCTTC 


GGTTAATTAC 




1701 


GATTAATTTG 


GTGCTCAAGG 


ACAGTATTCA 


NAT AC AC TAT 


TAAGAATNTA 


25 


1751 


TAGAGATAAT 


AAAACGATTA 


ACTCTGAAAA 


CNTGCGTAG 






SEQUENCE 
1 


2 (Short) [SEQ ID NO: 82 
MYGGVTLHDN NRLTEEKKVP 


] 

INLWLDGKXN 


TVPLETVKTN 


KKNVTVQELD 


30 


51 


LQARRYLQEK 


YNLYNSDVFD 


GKVQRGLIVF 


HTSTEPSVNY 


D* 



SEQUENCE 3 [SEQ ID NO; 14] 
atcccctctg aaccttcc 

35 SEQUENCE 4 [SEQ ID NO: 15] 
aaatggtagc gagaaaag 



Gene #6 
40 Staph Lipase Precursor 

SEQUENCE 1 [SEQ ID NO: 16] 

1 TCAAATGCAG TCAGGGAAGC AATAGGACGA TATGCATAAA GGAGATGGTA 

51 AAGTGGAACA GTGACAGAAG GTAAAGACAC GCTTCAATCA TCGGAGNCAT 

101 CAATCAANCA CAAAATAGTA AAACAATCAG GAACGCAAAA TGATAATCAA 

151 GTAAAGCAAG ATTCTGGAAC GACAAGGTTC TAAACAGTCA CACCAAAATA 

201 ATGCGACTAA TAATACTGAA CGTCAAAATG ATCAGGTTCA AAATACCCAT 

251 CATGCTGAAC GTAATGGATC ACAATCGACA ACGTCACAAT CGAATGATGT 

301 TGATAAATCA CAACCATCCA TTCCGGCACA AAAGGTATTA CCCAATCATG 
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15 



351 ATAAAGCAGC ACCAACTTCA ACTACACCCC CGTCTAATGA TAAAACTGCA 
401 CCTAAATCAA CAAAAGCACA AGATGCAACC ACGGACAAAC ATCCAAATCA 
451 ACAAGATACA CATCAACCCG CGTGCCTCAA ATCATAGATG CAAAGCAAGA 
501 TGATACTGTT CGCCAAAGTG AACAGAAACC ACAAGTTGGC GATTTAAGTA 
551 AACATATCGA TGGTCAAAAT TCCCCAGAGA AACCGACAGA TAAAAATACT 
601 GATAATAAAC AACTAATCAA AGATGCGCTT CAAGCGCCTA AAACACGTTC 
651 GACTACAAAT GCAGCAGCAG ATGCTAAAAA GGTTCGACCA CTTAAAGCGA 
701 ATCAAGTACA ACCACTTAAC AAATATCCAG TTGTTTTTGT ACATGGATTT 
751 TTAGGATTAG TAGGCGATAA TGCACCTGCT TTATATCCAA ATTATTGGGG 
20 801 TGGAAATAAA TTTAAAGTTA TCGAGGGAAT TGAGAAAGCA AGGCTATAAT 

851 GTACATCAAG CAAGTGTAAG TGCATTTGGT AGTAACTATG ATCGCGCTGT 
25 901 AGAACTTTAT TATTACATTA AAGGTGGTCA CGAGCGTAGA TTATGGCGCA 

951 GCACATGCAG CTAAATACGG ACATGAGCGC TATGGTAAGA CTTATAAAGG 
1001 AATCATGCCT AATTGGGAAC CTGGTAAAAA GGTACATCTT GTAGGGCATA 
30 1051 GTATGGGTGG TCAAACAATT CGTTTAATGG AAGAGTTTTT AAGAAATGGT 

1101 AACAAAGAAG AAATTGCCTA TCATAAAGCG CATGGTGGAG AAATATCACC 
35 "SI ATTATTCACT GGTGGTCATA ACAATATGGT TGCATCAATC ACAACATTAG 

1201 CAACACCACA TAATGGTTCA CAAGCAGCTG ATAAGTTTGG AAATACAGAA 
1251 GCTGTTAGAA AAATCATGTT CGCTTTAAAT CGATTTATGG GTAACAAGTA 
40 1301 TTCCGAATAT CGATTTAGGA TTAACGCAAT GGGGCTTTAA ACAATTACCA 

1351 AATGAGAGTT ACATTGACTA TATTAAAACG CGTTAGTAAA AGCAAAATTT 
1401 GGACATCAGA CGATAATGCT GCCTATGATT TAACGTTAGA TGGCTCTGCA 
1451 AAATTGAACA ACATGACAAG TATGAATCCT AATATTACGT ATACGACTTA 
1501 TACAGGTGTG TCTTCACATA CTGGTCCATT AGGGCACGAA AATCCTGCCG 
1551 AATTAGGCAC GAGACATTTT TCTTAATGGA TACAACGAGT AGAATTATTG 
1601 GTCATGATGC AAGAGAAGAA TGGCGTAAAA ATGATGGTGT CGTACCAGTG 
1651 ATTTCGTCGT TACATCCATC CAATCAACCA TTTATTAATG TTACGAATGA 
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50 
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1701 TGAACCTGCC ACACGCAGAG GTATCTGGCA AGTTAAACCA ATCATACAAG 
1751 GATGGGATCA TGTCGATTTT ATCGGTGTGG ACTTCCTGGA TTTCAACACC 
1801 GTAAGGTGCA GAACTTGCCA ACTTCTATAC AGGTATAATA AATGACTTGT 
1851 TGCGTGTGGA AGCGNCTGAA AGTAAAGGAA CACAATTGAA AGCAAGTTAA 
1901 ATTCATCTTC TGAATTTAAT AGGCTATGTA AATCGTGCTG TTATCATGGC 
1951 ACATCAGATA TAAGTAGCAT CACAGTGTTG AATCTCAAAA TAGTAAAGTG 
2001 AAATAAAGCG CCTGTCTCAT TAGCGAAAAC TAAAGGGACA GGCGTATCTG 
15 2051 TTTATGAGCT TAATAAATTG TATGAATAAT ATGGTTGATC GAATAACTGT 

2101 TTATCATTGA TGATAAATTT GAGTTTTTTA AAAATAATTG ATATATTACA 
2151 CCATTGTTAT AGCGTTTAAA GAAATCAACC CAACTTTACG ATAAATAGTG 
2201 ATTGCTTCGT CATTAGGTCT ACGATCAAAA TCATGCTCGT TTTTATTCAC 
2251 GCGTTCAAAT GTTGAATGTG GAACATGATT CATGATATGT TCGCTTTCCT 
25 2301 CAACGGGAAC ATCATAATCG CCATTACAAT GCGCAATGAA AACAGGTGGA 

2351 AGTGTTTTAA GNTCATCTGG TGCAATATTA TATTTTGAAT CAGTATAATC 
3o 24 01 ANCAATGTTA ATCATATTTA TCCATTTACC TGTGCCACGT GCATAAACGT 

24 51 AGAGTAAAAA ACGTGTGCGA TTTGATCTTG ANCAACCGGT GTTGGTGAAG 
2501 TGAGTTGTCC AATCATTGTT TCGTTTATGC TTTGAGCTAT TTTTGCGTAA 
35 2551 TACCTATTAG TTGTTTTAAA AGGGTTCAGT GTTGATGCGA CTATAACCAT 

2601 AAAAATCAAT AACACCATCA ATATCTCTGT CTCGTGCAAT TAATAAGACT 
2651 TAAATATGCA CCTGATGATC TGCCAAAGGT AAAAATAGGG CAATTAGAAT 
2701 ATTGTGATTG AATCGCATCG AATGATGCGT AGACATCCTC AATAATGCAA 
27 51 TCGAGACTTA CTTCTGGTAA TAAACGATAA CTTAGTTGAA TTAAATCGTA 
2801 ATGTTCCGTA AGGATATCGA TATACTGTGG GGATAAATCG TTAGCTTTAC 
2851 CGAACATTAA TCCACCACCG TGGATGTAGA CAATAACGCC TTTTGTTGGT 
2901 TGATTTTTTG CTTTAATAAT TGTGTAAGGT AATGCAAATG CATCTTTAGT 
2951 AATTACTTTA TATTTAATTT CAGTCACGAT TTAATAGGCT CCTTAGGAAT 
3001 CCGATATTGA TGTCATTATA ACACTGTCNT NAATTTCCAT GNAAAATAGT 
3051 CTTAAGACGA TGAGTCATGA TAATTCTGTT CCAATTGACG TAAAGCGTCN 
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3101 CGGGTATGCT TCTTTAGACC TTCCCCATAA TCCATCATTT TAACAATATC 

3151 TTTAAAAGCA GCATGTGGNA TGGCTAAATC TTCTAAATCT GCCATAGAAA 

3201 ATTCAAGATT GATATCATGT GGTCGCTGTT CAGCAAGTTT ATGCACAAAG 

3251 TCAGGTTCTG TGACCAAAGG CGAAGACATG CCGACCATAT CTGCATGTTG 

3301 TAAAGCATCT AAAGCAGACT CTGGAGAATT AATCCCGCCA CTTGCAATTA 

3351 AAGGGATACG ACCTGCTAAA TGTTCATAGA CAATTTGGTT AACTGGTCGA 

3401 CCGAAATGAT CACCTGGTGT ACGAGACGTA TTTTGATAAA TATGTCGACC 

34 51 CCAGCTAGCG ATTGCTAAGT AT.TGGATGTT TGAAACGTCC ATGACCCAAT 

3501 CGATTAATTG GTTGAACTCG TCAATGGTAT ATCCTAAATC ACTGCCTCTG 

3551 GTTTCTTCTG GCGTTGCTCG AAATCCTAAA ATAAAATTGT CAGGTGCTTC 

3601 TTTATCAATC ACTTCTTGTA CCGCACGCAT AACTTCTAAA CATAATCTTG 

3651 CACGATTTTT TAATGAGTCG GCACCGTAAT GGTCTGTACG TCTATTTGAA 

3701 AAAGTTGAGA AAAATGTTTG AATCAGCAAA CGTTGTGCAA TCGAAATTTC 

3751 CACACCATCA AAACCTGCTT TAATCGCGCG TGCATCGAGC TCGTGCC 

SEQUENCE 3 [SEQ ID NO: 17] 
gactaataat actgaacg 

SEQUENCE 4 [SEQ ID NO: 18] 
tctgtcggtt tctctggg 

Gene #7 

Fatty Acid Oxidation Complex Subunit 

SEQUENCE 1 [SEQ ID NO: 19] 

1 CAGGCGTTTC CTCNGGTACN TGTTGCNNGC CTTTAATTAC CGACNCTGCA 

51 ATANCCAAAC CGACCAGGTC GGATAGGGNA TATGTACCTG TTTTAGGACG 

101 ACCAATCGCT TGCCCAGTTA AAGCATCCAC ATCTACNATG CTTANCTTGT 

151 GTTGCTCGGC GCGATACAGA ATATCATTCA TTGTGTGCGT GCCGACTCTA 

201 TTTGCGACAA AGCCAGGCAC ATCATTGACG ACAATGACAC CTTTACCTAA 

251 TACATTGTGC GCGAAATTTT TTACATCTAA TATGATAGAT TCCTTCGTGT 

301 GTGACGTAGG TATTAACTCC ACTAATTNCA TAATACGTGG TGGGTTAAAG 

351 AAATGTAGAC CAAAGAATCG CTCTTGATCC TTCTCGTTAA ATGCTTGAGC 
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401 AATCGCATTA ATTGGGATTA CCTGATGTAT TTGTAGCAAA TAAAGCATCT 
451 TCTNTAGCAT GTTGTAGAAC TTGTTGCCAA ACAGCATGCT TAATTTCAAT 
501 ATCTTCTTTG ACTGCTTCGA TATATAAATC AGNATCATCA TTTACCAAGT 
551 CATCATCAAA ATTACCATAT GTTAAATGAC TCACTAGATT TAAGTCGAAT 
601 AGTAGCGGCC GTTTCTTATC TGTAATTTTA TCGTAAGATT TTTTCGCAAT 
651 GAGATTTGGA TCGTTTGTGT CCACTACAAT ATCTAATAGT TTTACTTTAA 
J5 701 GTCCAGCATN CACAAAGAGT GCTGCCAGTT GAGCGCCCAT CGTGCCTGCG 

751 CCAAGAACGG TTACTTTATT AATTGTCATA GTGATTCCTC CAATTTAGGT 
801 GAGGATAAGA TAACCATTAA GATAATTGGA ATAACGNTGC TATTTTATNA 
851 AATTAATTAA GTATCTTTGA CAAGACATCT CAGNCTCTTT ATTTTAAGGA 
901 AAAAGCTTTA TGCTTAAAAT AAGTCTTTTT TAGTGAAATT AATGCATCTC 
25 951 ATATAATTAT TTGCTATTTA TACGAAAGCA GAATCTCCAG TCAAAGCGCG 

1001 TCCAATTACT AAGGCATTAA TTTCATGTGT ACCTTCGTAC GTGTAAATCG 
1051 CTTCTGCATC AGAGAAGAAA CGTGCAATAT CATAATCGTC AGCTAGTATG 
U01 CCATTACCAC CTGTAATACC GCGGCCCATA GCTACTGTCT CACGCAAACG 
U51 TAAGGCATTC ATCATCTTCG CCGGTGAAGT TGCAACCTCG TCATATTCAC 
35 1201 CATGTGCTTG CATATTAGCT AATTGAGCAC ATGTTGCCAT TGCTTGAGCT 

1251 AAATTACCTT GCATCATTGC TAGCTTNTCT TGTATTAACT GATATTTACT 
1301 AATTGGGTNT GCCGAATTGC TTACGCTCAA GTGACATAAT CTAATGTGGC 
1351 ACGTAAAGCG CCAGCCATAC CACCTGTAGC CATATAAGCA ACGCCTGCTC 
1401 TCCGGTGGAA TAAAGAATTT TG 
SEQUENCE 2 [SEQ ID NO: 83] 

1 MLXKMLYLLQ IHQVIPI NAI AQAFNEKDQE RFFGLHFFNP PRIMXLVELI 
51 PTSHTKESII LDVKNFAHNV LGKGVIWND VPGFVANRVG THTMNDILYR 
5Q 101 AEQHKXSXVD VDALTGQAIG RPKTGTYXLS DLVGLXIAXS VIKGXQXVPE 
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SEQUENCE 3 (SEQ ID NO: 20] 
atgtacctgt tttaggac 
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SEQUENCE 4 [SEQ ID NO: 21] 
gagtcattta acatatgg 



5 Gene #8 

ATP DEPENDENT RNA HELICASE DEAD 



10 


SEQUENCE 
1 


1 [SEQ ID NO:22) 
ATACTTTGAT TTTAGATGAA 


GCTGATGAAA 


TGATGAATAT 


GGGATTCATC 


51 


GATGATATGA 


GATTTATTAT 


GGATAAAATT 


CCAGCAGTAC 


AACGTCAAAC 




101 


AATGTTGTTC 


TCAGCTACAA 


TGCCTAAAGC 


AATCCAAGCT 


TTAGTACAAC 


15 


151 


AATTTATGAA 


ATCACCAAAA 


ATCATTAAGA 


CAATGAATAA 


TGAAATGTCT 




201 


GATCCACAAA 


TCGAAGAATT 


CTATACAATT 


GTTAAAGAAT 


TAGAGAAATT 


20 


251 


TGATACATTT 


ACAAATTTCC 


TAGATGTTCA 


TCAACCTGAA 


TTAGCAATCG 


301 


TATTCGGACG 


TACAAAACGT 


CGTGTTGATG 


AATTAACAAG 


TGCTTTGATT 




351 


TCTAAAGGAT 


ATAAAGCTGA 


AGGCTTACAT 


GGTGATATTA 


CACAAGCGAA 


25 


401 


ACGTTTAGAA 


GTATTAAAGA 


AATTTAAAAA 


TGACCAAATT 


AATATTTTAG 




451 


TCGCTACTGA 


TGTAGCAGCA 


AGAGGACTAG 


ATATTTCTGG 


TGTGAGTCAT 


30 


501 


GTTTATAACT 


TTGATATACC 


TCAAGATACT 


GAAAGCTATA 


CACACCGTAT 


551 


TGGTCGTACG 


GGTCGGTGCT 


GGTAAAGAAG 


GTATCGCTTG 


TAACGTTTGG 




601 


TTAATCCAAT 


CGAAATGGAT 


TATATCAAGA 


CAAATTGAAG 


ATGCAAACGG 


35 


651 


GTAGAAAAAT 


GAGTGACTCC 


GCCACCTCAT 


CGGTAAGAAG 


TACTTCCAAG 




701 


CACGTGAGGA 


TGACATCAAA 


GGAAAAGGTG 


GAAACTGGAT 


GTCTTTAAGA 


40 


751 


GTCAAGAATC 


ACGCTGGAAA 


CGCATTCTTC 


AGAGGTGGGT 


AAATTGAATT 


801 
SEQUENCE 


TTACGATGTG 
3 (SEQ ID NO 


G 

:23] 









gatgaagctg atgaaatg 

45 



SEQUENCE 4 (SEQ ID NO: 24] 
tatctagtcc tcttgctg 

50 Gene #9 

PHOS PHOR I BOS YLAMI NE GLYCINE LIGASE 

SEQUENCE 1 (SEQ ID NO: 25] 
55 1 TAATTCGCAA TAGGAGTGAT GAATATCATA AATTTTACCC TCCAAATGAA 
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51 GCTAATGAAG TCCTGGACCC GAGTMGACG CATGTAGCCA AGCTAAAATA 
101 ATCCACTCTA CCTTATCTTT AGTTAATAAT GTTACTAAAT GTTGTTCATA 
151 CGCTGCTTTT GAATCAAATT GTTTTGGTTC ATTAATATAA ACAGGAATAT 
201 CGTGCTTGTT TGCTCTATCT ATACAAAACG CATTTTGATG ATCCGTATAT 
251 AGCNCCGTAA CTTCAATATT TTCAAGTTTT CCTGATTCAA CATGCTCAAC 
301 TATATTTTCA AAGTTACTTC CTGAACCTGA TGCAAAAATC GCAATTTTAA 
351 CCATTGTTAT ACCCCCAACA ATTCAATTGC AGTTGACTCA TTTTTCACAA 
15 401 TATGACCAAT TTGATAAGCT TCCACATTTT GTTCTGCTAA AATCTTCAAA 

451 GCGCGTCGAT GCATCTTTTT CATCAACGAT AACCGTATAG CCAATACCCA 
501 TGTTAAAAAT GTTATACATT TCATTTGTGT CTATATTGCC TTGTTGTTGT 
551 AACCAATCAA ATATTTTTGG CGTTGGAAAT GATGTAGTAT CAATTCTAGC 
601 AGCATATCCG GCTGGCAATG CACGTGGAAT ATTTTCATAA AAACCTCCAC 
651 CAGTAATATG ATTCATTGCC TTAATAGAAA CTTCTTTTTT TAAAGCAAGT 
701 ACAGGTNTGA CATATAATTT AGTTGGCTCT AAAAAGACAT CTATAAATGG 
3q 751 ACGATTATCG NAGGGTGATG CCAAATCAAT GNCTGATTCA NTAATTAATN 

801 TGCGCACTAA ACTGTNTCCA TTNGANTGAA TGNCACTTGG ACGCAAGTCC 
851 TATAACAACT TGGCCCTCTT NCAATTCTTG AACCATCTTA CAATAGNCAA 
901 CCTTTTTCAA CTGCTCCAAC AGCAAATCCG GCTACATCAT ATTCACCTTC 
951 GTGATACATT 

SEQUENCE 3 [SEQ ID NO: 26] 
ataagcttcc acattttg 

SEQUENCE 4 [SEQ ID NO: 27] 
gataatcgtc catttata 

Gene #10 

Methanobacteria formate dehydrogenase 

SEQUENCE 1 (SEQ ID NO: 28] 

1 GGCACGAGCG CTAAATAATT AATATTTAGT TTTTAAGTTA TTAATAACGT 

51 AGGGATATTA ATTTTAAAAG AAGCAGACAA AATGGTGTTT GCTTCTTTTT 

101 TATGTCGTAT AAGTAATAAA TAAAACAGTT TGATTTTAAA ATGAAAGCGT 
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151 AAAAATGGTA AAATATCCCA AAATTGATTG TGATATAATT ATAAGGAAAA 

201 TGAGCAATTT ATGAAAAAAG TTTACGNACA AATCGGAGAA TTAAAACTAA 

5 251 ATAATTATCA AAACAACGTC AATATTTAGT TGAATACTCA GACTTTAGCC 

301 CATGGCCAAG TGGGGAAGAC AGCATATATT AGTAAAGGTG AATGATTTGT 

io 351 TATTACTCAC TCGAAAATAG AAAGACAAGA TTTTAACGAT TAAAATAAAC 

401 TATTTTACAA ATAAAGTAAA ATTAATTTAT TANGCTAATA ATGCAAAAAA 

451 TTAAAAAGTA ATGGACAAAG AGATAATGAT ATGGCTCAAG AGGTAATAAA 

15 501 ATAGAGGTGG ACGCACACTA AATGGGGAAG TTAATACAAG G 
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SEQUENCE 3 (SEQ ID NO: 29] 
gcacgagcgc taaatttg 

SEQUENCE 4 [SEQ ID NO: 30] 
CTTCCCCATT TAGTGTGC 



Gene #11 
25 E.coli Nitrate Reductase 



SEQUENCE 1 [SEQ ID NO: 31] 

1 CCACCCANCT GATTATAATG TTTTAGCANG AGCTAGACTT GGTTGGTTAC 

51 CATCATATCC ACAATTTAAT AAAAATAGTT TGTTGTTTGC AGAAGAAGCT 

10 1 AAAGATGAAG GCATTGAGTC GAATGAGGCA ATTTTAAAAC GAGCGATAAA 

35 151 TGGAAGTTAA GTCAAAACAA ACGCAATTTG CGATAGAAGA TCCGGATTTG 

201 AAAAAGAATC ATCCGGAAAT CACTGTTTAT ATGGCGCTCA AATCTAATCT 

251 CAAGTTCTGC AAAAGGTCAA GAATACTTTA TGAAGCATTT ACTTGGCACA 

301 AAATCAGGGT TATTAGCTAC ACCAAATGAA GATGAAAAGC CAGAAGAAAT 

351 TACGTGGCGT GAGGAAACAA CAGGGAAATT AGATTTAGTC GTTTCTTTAG 

4s 401 ATTTCAGAAT GACAGCAACA CCTTTATATT CTGACATTGT TTTGCCAGCA 

4 51 GCGACTTGGT ATGAGAAGCA TGATTTGTCA TCTACAGATA TGCATCCATA 

501 TGTACATCCT TTTAATCCAG CTATTGATCC ATTATGGGAA TCGCGTTCAG 

551 ACTGGGATAT TTATAAAACG TTGGCAAAAG CATTTTCAGA AATGGCAAAA 

601 GACTATTTAC CTGGAACGTT TAAAGATGTT GTGACAACTC CACTTAGTCA 

55 651 TGATACAAAG CAAGAAATTT CAACACCATA CGGCGTAGTG AAAGATTGGT 
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701 CGAAGGGTGA AATTGAAGCG GTACCTGGAC GTACAATGCC TAACTTTGCA 
751 ATTGTAGAAC GCGACTACAC TAAAATTTAC GACAAATATG TCACGCTTGG 
801 TCCTGTACTT GAAAAAGGGA AAGTTGGAGC ACATGGTGTA AGTTTCGGTG 
851 TCAGTGAACA ATATGAAGAA TTAAAAAGTA TGTTAGGTAC GTGGAGTGAT 
901 ACAAATGATG ATTCTGTGAG AGCGAATCGT CCGCGTATTG ATACAGCACG 
951 TAATGTAGCA GATGCAATAC TAAGTATTTC ATCTGCTACG AATGGTAAAT 
1001 TATCACAAAA ATCATATGAA GATCTTGAAG AACAAACTGG AATGCCGTTA 
1051 AAAGATATTT CTAGCGAACG TGCTGCTGAG AAAATTCGTT TTTAAATATA 
1101 ACTTCACAAC CACGAGAAGT AATACCGACA GCAGTATTCC CAGGTTCAAA 
1151 TAAACAAGGT CGACGATATT CACCATTTAC AACGAATATA GAACGTCTAG 
1201 TACCTTTTAG AACATTAACA GGACGTCAAA GTTATTATGT GGATCACGAA 
1251 GTTTTCCAAC AATTTGGGGA GAGCTTACCA GTATATAAAC CGACATTGCC 
5 1301 GCCAATGGTA TTTGGGAATA GAGATAAGAA AATTAANGGT GGTACAGATG 

1351 CTTTGGTACT GCGTTATTTA ACGCCTCATG GANAATGGAA TATACACTCA 
1401 ATGTATCAAG ATAATAAGCA TATGTTGACA CTATTTAGAG GTGTCCACCG 
1451 GTTTGGATAT CANATGAAGA TGCTGNAAAA CACGATATCC AAGATAATGA 
1501 TTGGCTAGAA GTGTATANCC GTAATGGTGT TGTAACGGCA AGAGCAGTTA 
1551 TTTCGCATCG TATGCCTAAA GGTACAATGT TTATGTATCA TGCACAAGAT 
1601 AAACATATTC AAACGCCTGG GTCAGAAATT ACAGATACAC GTGGTGGTTC 
1651 ACACAACGCG CCGACTAGAA TCCATTTGAA ACCAACACAA CTAGTCGGAG 
1701 GATACGCACA AATTAGTTAT CACTTTAATT ATTATGGACC AATTGGGAAC 
1751 CAAAGGGATT TATATGTAGC AGTTAGAAAG ATGAAGGAGG TTAATTGGCT 
1801 TGAAGATTAA AGCGCAAGTT GCGATGGTAT TAAATTTAGA TAAATGCATA 
1851 GGATGCCATA CGTGTAGTGT GACATGTAAA AACACTTGGA CAAATCGTCC 
1901 AGGTGCTGAG TAACATGTGG TTCAATAACG TAGAAACGAA GCCAGGTGTA 
1951 GGGTATCCGA AACGTTGGGA AGACCAAGAA CACTACAAAG GTGGTTGGGT 
2001 ACTAAANTCG TAAAGGGAAA CTTGAATTAA AATCTGGAAG TAGAATTTCA 
2051 CAAATTGCTT TAGGTAAAAT TTTTTATAAC CCAGATATNC CATTAATAAA 
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10 



15 



2101 AGATTATTAT GANCCATGGA NCTATAATTA TGAACATTTA ACAACTGCGA 

2151 AATCAGGGAA GCATTCGCCA GTTGCTAGAG CGTATTCAGA AATTACAGGG 

2201 GATAACATTG AAATTGAATG GGGACCTAAC TGGGAAGATG ACTTAGCAGG 

2251 TGGTCATGTT ACAGGCCCAA AAGATCCTAA CATACACAAA ATAGAAGAAG 

2301 AGATTAAATT CCAATTTGAC GAAACTTTTA TGAG 

SEQUENCE 2 [SEQ ID NO: 84) 

1 MKHLLGTKSG LLATPNEDEK PEEITWREET TGKLDLWSL DFRMTATPLY 

51 SDIVLPAATW YEKHDLSSTD MHPYVHPFNP AIDPLWESRS DWD1YKTLAK 
101 AFSEMAKDYL PGTFKDWTT PLSHDTKQEI STPYGWKDW SKGEIEAVPG 
2q 151 RTMPNFAIVE RDYTKIYDKY VTLGPVLEKG KVGAHGVSFG VSEQYEELKS 

201 MLGTWSDTND DSVRANRPRI DTARNVADAI LSISSATNGK LSQKSYEDLE 
251 EQTGMPLKDI SSERAAEKIR F* 

25 

SEQUENCE 3 [SEQ ID NO: 32] 
attgatccat tatgggaa 

SEQUENCE 4 [SEQ ID NO: 33] 
30 catattgttc actgacac 

Gene #12 

E.coli ftsE (abc transporter) 

SEQUENCE 1 (SEQ ID NO: 34] 

1 AGTTATTGTA TTTAAAAATG TTTCATTTCA ATATCAAAGT GATGCATCCT 

51 TCACATTGAA AGATGTTTCT TTTAATATAC CTAAAGGTCA GTGGACATCT 

101 ATTGTTGGTC ATAACGGTTC TGGAAAATCT ACAATTGNCA AGTTAATGAT 

151 TGGCATAGAG AAAGTTAAAT CTGGAGAAAT TTTTTATAAT AATCAAGCTA 

201 TAACTGATGA TAATTNTGAA AAGTTAAGAA AAGACATAGG AATTGTATNT 

251 CAGAATCCGG ATAATCAATN TGTTGGNTCA ATTGTAAAAT ACGATGTGGC 

301 ATTTGGACTC GAAAATCATG CGGNTCCACA TGACGAAATG CATAGAAGAG 

351 TCAGCGAAGC ACTTAAACAA GTTGATATGT TAGAACGTGC AGATTATGAC 

401 CCTAATGCAT TATCGGGGGG ACAGAAGCAG CGTGTGGCTA TAGCAAGTGT 

451 ATTAGCACTT AACCCTCTGT CATTATATAG ATGAGGCGAC TCTATGTTAG 
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501 GATCCCTGAT GCACGTCAAA TTTATGGGAT TTAGNGAGAA AGTAANTCAG 

551 ACATTATATA CAATCATTCT ATACGCATGA TTTATCTGAG GCGATGAGNA 

601 GATCAAGTAT CCGTATGATA AGGACTTNCT TTTAAGGC 

SEQUENCE 3 [SEQ ID NO: 35] 
gtttcatttc aatatcaa 

SEQUENCE 4 [SEQ ID NO: 36) 
atctatataa tgacagag 

Gene #13 
B.subtilis secA 

SEQUENCE 1 (SEQ ID NO: 37] 

1 GTTAATCAAG TATCGAAGCG GAACAATCAT ACTTTAATGT TGAAGATTTA 

51 TATNGCGAAC AAGCGATGGT CCTAGTGCGT AATATTAATT TAGCACTGCG 
101 CGCACAATAT TTGTTNGNAT CTNATGTCGA TTACTTTGTA TATNNTGGTG 
151 ATATTGTTTT AACTGACCNC ATTACAGGTC GTNTGTTACC GGNAACTAAG 
201 TTGCAAGCTG GACTTCACCA NGCTATTGAA GCGAAAGAAG GTATGGAGGT 
251 TTCAACAGAT AAAAGTGTTA TGCCAACCAA TTACCCTTCC AGAATTTATT 
301 TAAACTTTTT GAATCAATTT TCAGGTATGA CAAGCTACAG GAAAATTAGG 
351 CGAATCAGAG TTCTTTGATT TGTATTCANA AATAGTCGTA CAAGCACCCA 
401 ACTGATAAAG CGATTCAACG TATCGATGAA CCAGATAAAG TGTTTCGTTC 
4 51 AGTTGATGAG AAAAACATCG CGATGATTCA TTGATATAGT TGAACTTCAT 
501 GANNCGGGGC CGACCGGTTT TACCTCATAA CCGAGNACTG CTGAAGCGGC 
551 TTGAATACTT TTCNGAAGTA TTATTCCAAA TGGATATTCC TAATAATTTA 
601 CTCATTGCGC AAAATGTTCC AAAAGAAGCG CAGATGATAG CTGAAGCAGG 
651 CCAAATTGGT TCCATGACTG TTGCGACTAG TATGGCAGGT CGAGGCACAG 
701 ATATTAAACT TGGTGAAGGT GTCGAAGCAT TAGCTGGATT AGCTGTTATT 
751 ATTCATGAAC ATATGGAAAA TAGCCGTGTA GACAGGCAAT TACGTGGTCG 
801 TTCTGGTAGA CAAGGGGATC CGGGATCATC TTGTATATAT ATTTCACTAG 
851 ATGATTATTT AGNTAAGCGA TGGAGCGATA GTAATTTAGC GGAAAATAAT 
901 CAATTATATT CANTAGATGC ACAACGATTA TCGCAAAGTA ATTTGTTTAA 
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951 


TCGNAAAGTT AAGCAAATTG TAGTTAAAGC GCAGCGTATC 


TCGGAAAGAA 


5 


1001 


CAAGGGGTTA AAGCTCGGTG AAATGGCTTA ATTGAATTTG 


NNAAAAAGCA 




1051 


TNAGTATTCA GCGAAGATCT TNGTATTTAC GANGGAACGC 


AAATCCGAGT 




1101 


TTTTAGAAAT TAGATTGATG CTGAGAATCC NAGATTTTTA 


ANGCGGTTAG 


10 


1151 


CTTAAAGATT GTATTTGAAA TNGTTTGGGG NAATGANGGA 


AANGGTGCTA 




1201 


ACAAAATCGC GNGTTGGGCG AGTATATTTT ATCAAAAATT TAAGTTNCCA 


15 


1251 


ATTTAATAAA GATGTGGCTT GTGTTAATTT TAAAGATAAG 


CAAGCAGNAG 


1301 


TGACATTTTT ATTAGAGCAA TTTGAAAAGC AATTAGCTTT 


GGANTCCGTA 




1351 


AAAACATGCA ANGNGCATAT TATTATAATA TTNCCGGCCA 


AAANGTCTTT 


20 


1401 


NGGGAAAGCA ATTGATNCAA GTTGGGGTTA GGAACAAGTC 


GGCTTTTNAC 




1451 


AACAANTTAA NAGCAAGCGN TAATCAAACG ACAAAANTGG 


CAACCT 


25 


SEQUENCE 
1 


2 [SEQ ID NO:85] 
MDIPNNLLIA QNVPKEAQMI AEAGQIGSMT VATSMAGRGT 


DIKLGEGVEA 




51 


LAGLAVIIHE HMENSRVDRQ LRGRSGRQGD PGSSCIYISL DDYLXKRWSD 


30 


101 


SNLAENNQLY SXDAQRLSQS NLFNRKVKQI WKAQRISER 


TRG* 


SEQUENCE 3 [SEQ ID NO: 38] 
ccgctaaatt actatcgc 





SEQUENCE 4 [SEQ ID NO: 39] 
35 ctgaagcggc ttgaatac 



Gene #14 

E.coli choline dehydrogenase 

40 

SEQUENCE 1 [SEQ ID NO: 40) 

1 ATATAAATTA TTTAAGCGTA TGGTTTTACT TCGATTGCAC CCTTCATTTT 

51 CATCATTGAA CACCATGCTT AATATAATCC ATATATTTGT GGCTCTAAAG 

45 

101 NCTTTCCTCC CACCGTATAA TGTCTGCTGC TTTTTCAGCT AACATTAAAA 

151 CAGGTGCGTG TATATTGCCA TTTGTCGTAC GTGGCATAGC GGATGCATCA 

50 201 ACTACACGTA AATTTTCCAT ACCGTGGACT TTCATTGTTA ACGGGTCAAC 

251 TACTGCCATT GGATNCTGAA GCAGGACCCA TTTTAGCACN ACAAGATGGG 

301 TGTAATNCTG TTTCACCATC TCNACGGAAN NCAATCAAGN ATTTCTTCGT 



3 
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351 CTGTTTGCAC TTCTGGGTCC TGGGTGAAAT TTCTCCACCA TTGAATGGAT 
401 CCATTGCTTT TTGAGATAAG ATATTTCTTG CTACACGAAT TGCTTCTACC 
451 CATTCTNTTT TATCTTCTTC TGTTGATAAA TAATTAAAGC GGATACTTGG 
501 TTTTTCGAAT GGATCTTTAG ATTTGATTGG CACGAGCTAC CACGAGAGTT 
551 TGAATACATT GGTCCTACGT GAACTTGATA ACCATGTGCG ACCGCTGCCT 
601 TTTGACCATC ATATCTTACA NCTATTGGTA AGAAATGGAA CATTAAGTTA 
651 GGATAATCAA CTTCGTTATT TGAACGTACA AATCCGCCAC CTTCAAAATG 
701 GTTAGATGCT GCTGCACCTG TACGTGTGAA AATCCAGTGG TAAACCAATT 
751 AAATGGCATG CGCCTTGATA TCTAAGCTTG GCTGTAATGA TACAGGTTTC 
801 CTTACATTTA TGTTGAATGT ATACCTCTAA GTGATCTTCC AAAGTTTTCA 
851 CCCACACCTG GTAAATGAAC ACGTGGCTCA ATGCCTTTTG ATTTTAGGAA 
901 CTCTGAATCA CCGATACCAG ATAATTGTAG TAATTGTGGC GTTATTGAAT 
951 GCCCC 

SEQUENCE 3 [SEQ ID NO: 41] 
gaagcaggac ccatttta 

SEQUENCE 4 [SEQ ID NO: 42] 
gattttcaca cgtacagg 

Gene #15 

S. aureus DNA Gyrase 

SEQUENCE 1 [SEQ ID NO: 43] 

1 GAATTCCTAC ATAATACTTT TGTTTACCTT GTGTCAGTTT ATACAACGGT 
51 GGCTGTGCAA TATACACATA GCCTGCTTCA ATTAACGGTC TCATAAATCG 

101 ATAGAAGAAT GTTAATAACA ATGTTCTAAT ATGCGCTCCA TCCACATCGG 

151 CATCAGTCAT AATGACGATT TTGTGATATC TTGCTTTCGC TAGATCAAAG 

201 TCGCCACCGA TTCCTGTACC AAATGCTGTG ATCATTTGAC GAATTTCATT 

251 GTTATTCAAA ATTCTATCTA ATCGTGCTTT NTCAACATTT AATATCTTAC 

301 CTCGTAATGG TAAAATCGCC TGCGTTCTAG AGTCACGACA GATTTTGGTG 

351 GACCCCCNGC AGAGTCCCCT TCGACTAAGA AAATCTCACA TTCTTCAGGA 

401 CTTTTACTAG AGCAATCGGC TAATTTACTG GAAGACTGCT ACATCTACGC 

45 
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451 TGATTTACGA GGTGTTACTT CAGGGCTTTN TCGAGACACG TGCANGT 

SEQUENCE 3 [SEQ ID NO: 44] 
cataatactt ttgtttacc 

SEQUENCE 4 [SEQ ID NO: 45] 
agtaacacct cgtaaatc 

10 Gene #16 

E.coli pts system ptkC 



15 


SEQUENCE 
1 


1 [SEQ ID NO: 46] 
CTANCNAANG GAANTTCAGC ATCCTTAAAA ATACCTATTT 


GACTGTAGAA 


51 


ACCTTTTGNT GCGTACAATA TCTAAACCTT GTCGTGCTGC 


TGGAACTGCA 




101 


CCTGAACATT 


CAACAACAAC ATCTGCACCG TAACCGTCTG 


TAATTCCATT 


20 


151 


GATATACGTT 


TTTAAGTCTG TGTGTTGTAA ATTGACTACA TAATCCATGT 




201 


GCAATGCTTC 


TGCTTTATCT AATCTGACTT NGTGGCANTG 


TCCAATCCAG . 


25 


251 


TTACCACAAC 


AGGTGCGCCT TTACTTTTCA ACACTTGTGC 


TACAAGTAAT 


301 


CCGATTGGCC 


CAGGTCCCAT TACAACTGCT ACATCGCCAG 


AGTTCACTTG 




351 


AATCTTAGAA 


ACGCCATGAT GTGCACATGC TAATGGTTCT 


TGTCATAGCT 


30 


401 


GCAGACTGAT 


ACGATACTTC CGCTTCTGGA ATATGATNCA 


AACTTTCTTC 




451 


ACGTGCAATG 


ACATAATTAG TAAATGCGCC ATCAACTTGT 


GTTCCAATAC 


35 


501 


CTTTTCGATG 


GTTGCATAAA TGATAGTTTT TTGATTTACA 


GGAATCACAC 


551 


TCATTACANA 


CCATAGAATG TAGTTTCAGA AGTGACNCGG 


TCACCAACTT 




601 


TAAAATCNTT 


AACGTCTGCT CCCAACTTCA AC GAT NT C AC 


CAGAAAATTC 


40 


651 


ATGACCTAAT 


GTCACTGGAA AATTAACTTN ATAATGCCCT 


TCATAAGTAT 




701 


GAAGGTCTGT 


GCCACAAATT CCTGCATAAT GTACTTTAAT 


CTTTACTTTA 


45 


751 


TCATCTAGCG 


GTGTTGCAAC TTCTTTATCA AGAAGTTCTA AGTTGCCATG 


801 


TCCTTCTCTT 


GTTTTTACTA AAGCTTCCAC CACAAACACN 


TCGANTTTTT 




851 


ANTTGNAATA GACTNNATAG NTTNAAGATA AGATAGTTAN CGATATTNCC 


50 


901 


ACCTTGATCA 


ATACTTGANA TTTCAGATGA ACCTTTTGNC 


ATTTGTACAT 




951 


TCGTACCTTT 


CGCCATATCT GTGAAAATGG GTGCTACGTC 


TGTTGCAATA 




1001 


TATAATGAAA 


TTGCAATCAT AATCGTACCC ACAATGACAG 


AATGAATAAT 
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1051 GTTTCCTCTT GCTGCACCAA CAATAAACGC GACAACAAAT GGTATAGTTG 

1101 CTAAGTCACC AAAAGGTAGT ACTTGGTTTC CTGGTAAAAT AACGGCTAAT 

5 1151 AAAACAGTGA TAGGTACTAA AATTAATGCT GTCGAAATAA CCGCTGGATG 

1201 ACCTAATGCT ACAGCCGCAT CCAATCCAAT ATAAATTTCA CGTTCGCCAA 

o - 1251 AACGTTTATT TAGCCATGTT CTTGCAGACT CTGAAACTGG CATTAAACCT 

1301 TCCATTAAGA TTTTTACCAT TCTAGGCATT AAGACCATTA CTGCAGCCAT 

1351 TGACATTCCT AAATTAATGA TGTCTCCAGG TTTGTAACCT GCTAACACAC 

5 1401 CAATACCTAA ACCTAAAATT AAGCCGACAA ATATAGACTC TCC 
SEQUENCE 2 [SEQ ID NO: 86] 

1 GESIFVGLIL GLGIGVLAGY KPGDIINLGM SMAAVMVLMP RMVKILMEGL 

51 MPVSESARTW LNKRFGEREI YIGLDAAVAL GHPAVISTAL ILVPITVLLA 

101 VILPGNQVLP FGDLATIPFV VAFIVGAARG NIIHSVIVGT IMIAISLYIA 

151 TDVAPIFTDM AKGTNVQMXK GSSEXSSIDQ GGNIXNYLIX XLXSLXQXKX 

201 RXVCGGSFSK NKRRTWQLRT S* 

SEQUENCE 3 (SEQ ID NO: 47] 
gttctaagtt gccatgtc 

SEQUENCE 4 (SEQ ID NO: 48] 
cctagaatgg taaaaatc 

Gene #17 

S. typhimurium adenine glycosylase 

SEQUENCE 1 (SEQ ID NO: 4 9] 

1 CCATTTAAAA GTATTGTAAA ATCATCCACN TTNTATAAAC CAACCACNTT 

51 AACNTTTTTG ACATTTGTTA TCCGATGAGA TTAAAAGATA TCAATNAATA 

101 CAATTTTTAN AATTAATGTC ACTATGTTTT CCGATAATAT NACCCAATCA 

151 TCGNAATGTT ACCCATTTAT AAAATGANAA ATCNTTGACA TAGGTANAGG 

201 GAATGTATAT TGGTCNCGGA TCACTTAAAT TAAACCCANA TCATGTCATC 

251 TGGTAATGTN TCAATGTTAA TTGCTCCTGA AGCGGCGTAN ACTTTAATCT 

301 TCCATGTTAA ATGAGTAAAT TGATGCGTCA ACTCNAAAAT AGGTGTTTCT 

351 NCTGGNTGAA TGTCATGACC GATTTTTTCA NTCATTTTAC GTCTANCATG 
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401 ctcactat.cn aacataggan attgccacat accatacnat aattnttccc 
451 tacgcttttg caacagatat tgaccttgat tatttctaat taanaagacg 
501 gattgctcaa ttacnttttt acttacattt ttagatttaa caggtaactt 

551 TTCAAATGGA CCTTTATCAA ATGCCTCACA GTTTTCTTGN ACTGGACNAA 
601 ATAAGCATAA TGGATTTTTT GGTGNACAAA TTAATGCCCC TAATTCCATC 
651 ATAGCTTGAT TAAACGTTCC AGCTTCTGTA GTAACATACG GTAACAATTC 
701 TTGTTCGTAC GATTTCCTCG TCGATTGTAA TTTAATATCT CGATAGTCAT 
751 CATTCAATCT AGACCATACG CGAAAAACAT TTCCGTCTAC AGTTGCTAGT 
801 GGTACATTAT ATGCAATGCT CATTACTGCA GCTTGTGTGT ATGGGCCAAC 
851 ACCTTTTAAC GCTTTAAATT GATCAGGATC TTTGGGAACT AAGCCTTCAT 
901 ATTTATCANA AACTTCTTTA ATCGCCGTAT GAAAATTTCG AGCTCTACTA 
951 TAATATCCTA AGCCTTCCCA ATACTTTAAC ACTTCATCTT CCGAAGCTTG 
1001 ACTCAAAACT TCCACAGTTG GAAATCGGNC ACCAAAACGA TGATAATAGT 
1051 CAATAACTGT TTTAACTTGT GTCTGTTGTA ACATGACCTC ACTTAACCAA 
1101 ATATAGTACG GATTGGTCGT TTGTCGCCAT GGCATTTCTC TTTGATTTTC 
1151 ATCAAACCAG TGTATCAAAT TTTCTTTAAA ACTAGACTGC TGATACATTT 
1201 ATAAAACCCT TTCCTCACCA AAATTAATTG TCTTTACTCA TAATGTTTTT 
1251 ATTGTACATT AAAATCATGG TTAGTATGTA AGTTAATTTA GTTATNTGCG 
1301 AAATTGGATT ATAATAGTAT ATATAATATT ATGAAATGAG TGAACTGATA 
1351 TGGACACTGC AACACATATC GCAATTGGGG TGGGCCTTAC AGCACTTGCA 
1401 ACTCAAGATC CAGCAATGGC TTCTACGTTT GGTGCAACAG CTACAACCCT 
1451 TATCGTTGGT TCATTAATTC CTGATGGGGA TANTGTNCTT AAATTANAGG 
1501 ACANTGCAAC ATATATTTCG NATCATAGAG GNATNACGTC ATNCCATCCC 
1551 CTCCCACAAN NNTATGNCCA GTCNCNTTTA CANTTTNTAT NTNTTCACGT 
1601 CACTNTNGCT GGTANGCATC CCNCCTCACG TATGGCTTGT GG 

SEQUENCE 2 [SEQ ID NO: 87] 

1 MYQQSSFKEN LIHWFDENQR EMPWRQTTNP YYIWLSEVML QQTQVKTVID 

51 YYHRFGXRFP TVEVLSQASE DEVLKYWEGL GYYSRARNFH TAIKEVXDKY 
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10 



15 



20 



30 



40 



101 EGLVPKDPDQ FKALKGVGPY TQAAVMSIAY NVPLATVDGN VFRVWSRLND 

151 DYRDIKLQST RKSYEQELLP YVTTEAGTFN QAMMELGALI CXPKNPLCLF 

201 XPVQENCEAF DKGPFEKLPV KSKNVSKXVI EQSVXLIRNN QGQYLLQKRR 

251 EXLXYGMWQX PMXDSEHXRR KMXEKIGHDI XPXETPIXEL THQFTHLTWK 

301 IKVYAASGAI NIXTLPDDMX WV* 

SEQUENCE 3 [SEQ ID NO: 50] 
tcctgaagcg gcgtatac 

SEQUENCE 4 [SEQ ID NO: 51] 
tatgaaggct tagttccc 

Gene #18 

S. aureus femA 



SEQUENCE 1 (SEQ ID NO: 52] 

1 GGGAAAAAAA GAAAACCTTC CAAAATACGG GAAATTGAAA TTAATTANCC 

25 51 GGAGAGACCA NATAGGAAGT AATTGATAAT GGAAGTTTCC CCANAATTTA 

101 ACAAGCTAAA AGAGTTTGGG TGCCTTTTAC AAGATAAGCA TGCCAATACA 

151 GTCATTTCAC GCACACTGTT GNCCACTATG AGTTAAAGCT TGCTGAAGGT 

201 TATGAAACAC ATTTAGTGGG AATAAAAAAC AATAATAACG AGGTCATTGC 

251 AGCTTGCTTA CTTACTGCTG TACCTGTTAT GAAAGTGTTC AAGTATTTTT 

35 301 ATTCAAATCG CGGTCCAGTG ATCGATTATG AAAATCAAGA ACTCGTACAC 

351 TTTTTCTTTA ATGAATTATC ANAATATGTT AAAAAACATC GTTGTCTATA 

401 CCTACATATC GATCCATATT TACCATATCA ATACTTGAAT CATGATGGCG 

451 AGATTACAGG TAAGGCTGGT AATGATTGGT TCTTTGATAA AATGAGTAAC 

501 TTAGGATTTG AACG 

SEQUENCE 3 [SEQ ID NO: 53] 
45 gaggtcattg cagcttgc 

SEQUENCE 4 [SEQ ID NO: 54] 
CAAATCCTAA GTTACTCATT 

Gene #19 

Parsley S-adenosyl methionine synthetase 
SEQUENCE 1 [SEQ ID NO: 55] 

1 CGCACATAAC GTGCAGCATA TGCAGCTGAG CGGTCTACTT TTTGTAGGAT 

49 
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51 CCTTACCACT GAAGCATCCG CCACCATGAC GTGCATAGCC ACCATACGTA 

101 TCAACAATGA TTTTACGTCC TGTTAATCCT GCATCACCTT GAGGTCCACC 

151 GATTACAAAG CGTCCTGTAG GATTGATGTA GAATTTAGTT TGTTCATTAA 

201 TCAAGTTTTC TGGAACAGTT GGATAAATGA CATGCGCTTT GATGTCTTCT 

251 TGAATTTGTT CAAGTGTCAC ATCATCAGCA TGTTGTGTTG ATACGACAAT 

301 CGTATCAATA CGTACTGGGT TATCATTTTC ATCATATTCA ACAGTGACCT 

351 GAACTTTACC GTCTGGTCGT AAATAATTCA ACGTCTCGNG CCATCTTTTA 

401 CGCACATCAG ATTAAACGTT TGGGGCAATT GGGTGTGATA AATTAAATTG 

451 CTAGAGGGAT GTACGTTTCT TGTTTCAAT 

20 SEQUENCE 3 (SEQ ID NO: 56] 
acgtgcatag ccaccata 



10 



15 



25 



SEQUENCE 4 [SEQ ID NO: 57] 
acaagaaacg tacatccc 

Gene #20 

E.coli dipeptide permease 



30 Sequence 1 (SEQ ID NO: 58] 

1 ACAACCCTNC AGTGCTTGGC CAATTAGGTA GAGAATTTNA CCTAGGTAAN 

51 TTAATGCGAT AAAGCCCAAG TTTGTAAAAT GTCCNTTGTG CGCCAATTTG 

35 101 TTCCTGTACN TANTGGGANC TATTTTAGGA TTCTTATCAG GGATATTTCC 

151 CAAGGGTTTT GTTGACNCCT TAATCATGCG TGCGTGTGAT GTTATGTTGG 

4Q 201 CAATTCCCCA AGTTATGTTG TAACGTTAGC ATTAATTTGC ATTGTTTGGA 

251 ATGGGTGCCG AAAATATTAT CATGGCATTT ATTTTGACGC GTTGGGCATG 

301 GTTCTGTCGT GTTATACGTA CAAGTGTTAT GCAGTACACT GCTTCTGACC 

45 351 ATGTCAGATT TGCTAAAACA ATCGGTATGA ATGATATGAA AATTATTCAC 

401 AAACATATTA TGCCGTTAAC ATTAGCAGAT ATTGCTATCA TCTCTAGTAG 

5Q 4 51 TTCGATGTGT TCAATGATCT TGCAAATATC TGGCTTTTCA TTTTTAGGAT 

501 TAGGTGTCAA AGCGCCTACT GCAGAGTGGG GCATGATGCT TAACGAAGCT 

551 AGAAAAGTGA TGTTTACACA TCCTGAAATG ATGTTTGNGC CAGGTATTGC 

601 CATAGGGATT ATAGTGATGG CATTTAACTT CTTATCCGAT GCTTTACAAA 
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651 ATTGNTATTG GATCCCCCGC ATCTCTTTCT TAAAGATAAA CTTCCGCNCC 

^ 701 TTGTGAAAAA AGGGAGTGGN GCAATCATGA CATTGTTAAC AAGCTAAGCA 

751 TTTGGCGATT ACAGATACCT GGACAGATCA ACCACCGTGA GTGATGTGAN 

801 TTTNNCAATT AACTAAGGGG TGAAACTCTA GGCNTTATTG GGGAAAGTGG 

10 851 TAGCGGT 

SEQUENCE 2 [SEQ ID NO: 88] 

1 MGAENIIMAF ILTRWAWFCR VIRTSVMQYT ASDHVRFAKT IGMNDMKIIH 

15 51 KHIMPLTLAD IAIISSSSMC SMILQISGFS FLGLGVKAPT AEWGMMLNEA 

101 RKVMFTHPEM MFXPGIAIGI IVMAFNFLSD ALQNXYWIPR ISFLKINFRX 
151 L* 

20 

SEQUENCE 3 (SEQ ID NO: 59] 
atattatcat ggcattta 

SEQUENCE 4 [SEQ ID NO: 60] 
25 atctttaaga aagagatg 



30 



Gene #21 

S.carnosus pts mannitol permease 

SEQUENCE 1 [SEQ ID NO: 61] 

1 GAATTCTTGC ACATGTTGCT CGGTGTCTTC CTTGCTGCAC TTGTATCATT 

51 CGTTGTAGCT GCTTTAATTA TGAAGTTCAC TAGAGAACCA AAGCAGGATT 

101 TAGAAGCTGC GACAGCTCAA ATGGAAAATA CTAAAGGGAA AAAATCAAGC 

151 GTTGCTTCTA AGTTAGTATC TTCTGATAAA AATGTTAATA CAGAAGAAAA 

201 TGCTAGTGGT AATGTTAGTG AAACATCTTC ATCAGATGAT GATCCTGAAG 

251 CGCTATTGGA TAATTACAAC ACTGAAGATG TTGATGCACA CAATTACAAT 

301 AATATAAATC ATGTTATTTT TGGCTGCGAT GCGGGTATGG GTTCTTNGGT 

351 GCAAATGGGG TGCAAGCATT GTTACNGTNA TTAAATTTTA AAAAGGCGGC 

401 AATTAATGAT ATTACAAGGG TACAAATTAC TGCGAATTAA TCAAATTGCC 

451 AAAAGATGCT CCAATTANGN TATCAACTCC AGAAAAACTA CTTGATCCGG 

501 GCTATTAACA AACACAATGC CATCCATATT CNAAGGGGNT TAATTTCCTA 

551 ATCACCAAGA TATGNAGGAC TTTTAATTAT CTTAAAAAGG TGG 

51 
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SEQUENCE 2 [SEQ ID. NO: 89] 

1 MIFGKGTAKA TSYGAGIIHF LGGIHEI YFP YVLMRPLLFI AVILGGMTGV 

51 ATYQATGFGF KSPASPGSFI VYCLNAPRGE FLHMLLGVFL AALVSFWAA 

101 LIMKFTREPK QDLEAATAQM ENTKGKKSSV ASKLVSSDKN VNTEENASGN 

151 VSETSSSDDD PEALLDNYNT EDVDAHNYNN INHVIFGCDA GMGSSAMGAS 

10 201 MLRNKFKKAG INDITGYKYC D* 

SEQUENCE 3 (SEQ ID NO: 62] 
tgcacatgtt gctcggtg 

15 SEQUENCE 4 [SEQ ID NO: 63] 
GTGGTAATGT TAGTGAAAC 

Gene #22 

20 Mycobacterium phosphate sensor PhoR 

SEQUENCE 1 [SEQ ID NO: 64] 

1 GGCACGAGCG AGTTCATTAG CTATATATAA GCCTAATCCA GAACCACCCG 

25 51 TTTTTGTATT ACGAGAGTTT TCTACTCTGA ATGTACGTTC GAATATACGT 

101 TCTTGTAGTT CTGGTATAAT GCCAATACCT CNATCGCTAA TAGCAATGTC 

151 GATAGTATCT TGATCTTTGT TTTCACTAAT ATTAATATCA ATGCGACTAC 

30 

201 CAACATTTGA AAATTTTAGC GCATTATCAA GTAAGTTTGT TAAAATACGC 

251 TCAAGTGGCG TTCGATATTG ATAAAATGCA TCAATTTCGC TACAGAAATT 

35 301 CACTTCTAAT GTGCGGTTTT CATGTTTGAT ACGTTGCTCC ATATGGTTGC 

351 AATATTGATA CAAGTAATTG GTCTAGTTGT ATTAATTCTG GGGGATATGT 

401 TTTACCTGTA TTTAAAGTTG ATAAT 



40 
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SEQUENCE 3 [SEQ ID NO: 65] 
tataagcctaatccagaacc 



SEQUENCE 4 [SEQ ID NO: 66] 
45 aacgtatcaaacatgaaaac 



Gene #23 
UNKNOWN 

SEQUENCE 1 [SEQ ID NO: 67] 

1 GTACGAGCTC GTGCCGGCAC GAGCGATTGG TGCAGTGAGT TATGTTTTAG 

51 AACAATTAGA TGCACCAGTA TATGGATCTA AATTGACAAT AGCGTTAATT 
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101 AAAGAAAATA TGAAAGCCCG TAATATTGAT AAAAAAGTTC GCTACTACAC 

151 AGTTAACAAT GATTCAATTA TGAGATTCAA AAACGTGAAT ATTAGTTTCT 

5 201 TTAATACGAC ACACAGTATT CCTGATAGTT TAGGTGTCTG TATTCACCCT 

251 TCATATGGTG CCATTGTGTA TACAGGTGAA TTTAAGTTTG ACCAAAGTTT 

iQ 301 ACATGGACAT TATGCACCAG ATATTAAACG TATGGCAGAG ATTGGTGAAG 

351 AAGGCGTATT TGTCTTAATC AGTGATTCTA CTGAGGCAGA GAAACCTGGA 

401 TATAATACTC CCGGAAAATG TAATTGAACA TCATATGTAT GATGCCTTTG 

15 451 CCAAAGTGCG AGGTC 

SEQUENCE 3 (SEQ ID NO: 68J 
tttagaacaattagatgcacc 

20 SEQUENCE 4 [SEQ ID NO: 69] 
tccgggagtattatatccag 



25 



30 



Gene #24 

Anabaena nitrogen fixation gene 

SEQUENCE 1 [SEQ ID NO: 70] 

1 GGCCCAAACC CATCCAAGTC CTTTTTAATT GACTTATTTA CATTATTTCT 

51 TTAATTTGGA TTAACAAATT TTTTTCTATT TGANCCCTTT AATGTTNACT 

101 CCCCGTATCT AACAAGCAAG TGATCATACT TCATTATTTT AGCAACTCCT 

^ 151 TAATTTCCTC ATAAATGATG ATAAATATTT CTTTAAACCT TGCTATATCT 

201 TCTTTAGTTG TAGTAGCCCC AAATGATAAT CTTATACTAC CTTCAATAGA 

251 TTTGTCTGAT AATCCCATTG CAGCCAATAC TTCATTTAAT TTATTACGTT 

301 TAGATGAACA AGCACTCGTC GTAGATATCA TAATGTCATA TTTTGAAAAA 

351 GCATTAACTA ATACTTCACC TTTTACGCCA GGAAAACTAA GATTTAAAAC 

45 401 GAATGGTGAA CCTGAAGTTG AAGAATTAAT ATAAACTCCA TGATATTTAT 

451 TTAAAAATTG ACGGACGTCA TTATTTAACT CAGTAACAAA TGCATTCAAT 

501 GCTTCAAAGT TTTCATTAGC TCGTGCC 



40 



50 



55 



SEQUENCE 3 [SEQ ID NO: 71] 
ttttagcaactccttaatttcctc 

SEQUENCE 4 (SEQ ID NO: 72] 
gcacgagctaatgaaaactttg 
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10 



15 



20 



Gene #25 
UNKNOWN 

SEQUENCE 1 (SEQ ID NO: 73] 

1 GACAACTTGC TAAAGCACGT GATGAAAAAG TAAGTGAATA TGGAATTGAA 

51 CAAGCTGATG GTACATTAAT TCAATATGAT AGTGAAGCCA AGATATATGA 

101 ACATTTTAAT GTGAATTTTA TACCACCTGC TATGCGAGAA GATGGTAGCG 

151 AATTTGATAA AGATCTAAGT AATATCATTA CATTAGATGA TATTAATGGT 

201 GATATTCATA TGCATACAAC GTATAGTGAT GGTGCGTTTT CTATTCGAGA 

251 CATGGTAGAA GCAAATATCG CAAAAGGTTA TAAATTCATG GTAATTACTG 

301 ATCATTCACA AAGTTTACGT GTTGCTAATG GCTTACAAGT GGAAAGACTT 

351 TTTANGACAA AAACGAAGGA AATTAAGGCT TTAGATAAAG AATATAGTGA 

401 AATTGGATAT TTATTCAGGT ACAAGAAATG GATATATTAA CCTGATGGCT 

25 451 CGCTGGATTA TGATGATGAA ATTTNAGCAC AACTTGGATA TGTNATTGGA 

501 GCTATTCAAC AAAGCTTNAN CCAATCAGAA GAACAAATNA TGGAACGGAT 

551 TAGCTAATGC ATGTCGCAAT CCATACGTGC GACATATAGC GCATCCAACA 

30 601 GGGCGTATTA TAGGTAGAAG AGATGGTTAT AAACCGAATA TTGAACAATT 

651 AATGGCATTA GCTGAAGAAA CGAATACAGT ATTAGAAATT AATGCCAATC 

35 701 CACATCGACT GGATCTTGAA CGCTGAAATC GNTCGNNAAT ATCCAAATGT 

751 GAAATTAACT NTTAACACTG ATGGGCATCA TNCAAATCAA TTNGATTTTN 

801 TGGAATTATG G 

40 SEQUENCE 3 (SEQ ID NO: 74] 
acgtgatgaaaaagtaagtg 

SEQUENCE 4 [SEQ ID NO: 75] 
tcttgtacctgaataaatatcc 

Gene #26 

periplasmic binding protein 

SEQUENCE 1 [SEQ ID NO: 76) 

1 AGATCGTTCG CTAATTGACA ATTGATTAAA TCCCCTATTA CAAAATTGGA 

51 TATTACCTGT TATATCTAAA AATCCACAAA TTGCTTTAGC AAGTGTTGAT 
101 NTGNCGGCAC CATTGTGACC AACTATACTA AGCATTTCTC TTCTATAAAC 
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10 



151 ATTTAATTGA ACATTATTAA GTACACTATT ACTATAGTCA CTATATTGAA 
201 CACATACCTC ATTTAATTCT AATAGCGGCN CAGATGTGTA CTTATTATCA 
251 TTATGTGCAG ATGTNTCATC TATCCATTTN NNCACTTTAA NTTTAACATG 

301 TTCACTCATA CAAACGACAC GTAANTTCGC TAAGTTATCA ATGGATTCGA 

351 CATCTACTTC TGNATATTNA AGCGCTGNAC AGTATAATGG NACACGTATG 

401 CCTGCTTCTT TAAGCTTAGA TGATTTTAGC AAATCACTAG GCGTTGTATT 

j5 451 AGCGATGATT TTTCCATCTT TAAAAAGAAG ANCTCTATCA AACGTATCAT 

501 CTAATGANTC TTCTAATCGA TGTTCGACAA TAATCATCGT TGACTTTGTT 

551 TCTTCATGAA TATTGTNTAA CAATCTCAGC GTTTCATGTC CTGTCGCAGG 

601 ATCTAAATTG GCCAGCGGCT CATCCAATAT TAAAATAGGC GTNCGATGGA 

651 TTAATATACC ACCTAATGAA ACGCTCGTGC C 
SEQUENCE 2 [SEQ ID NO: 90] 

1 GTSVSLGGIL IHRTPILILD EPLANLDPAT GHETLRLLXN IHEETKSTMI 

51 IVEHRLEXSL DDTFDRXLLF KDGKIIANTT PSDLLKSSKL KEAGIRVPLY 

3Q 101 CXALXYXEVD VESIDNLAXL RWCMSEHVK XKVXKWIDXT SAHNDNKYTS 

151 XPLLELNEVC VQYSDYSNSV LNNVQLNVYR REMLSIVGHN GAXXSTLAKA 

201 ICGFLDITGN IQFCNRGFNQ LSISERS 

35 SEQUENCE 3 [SEQ ID NO: 77] 
aattgacaattgatcaaatcccc 

SEQUENCE 4 (SEQ ID NO: 78] 
gccaatttagatcctgcgac 
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SEQUENCE LISTING 

(1) GENERAL INFORMATION 

(i) APPLICANT: Burnham, Martin 
Hodgson, John 

(ii) TITLE OF THE INVENTION: Novel Compounds 

(iii) NUMBER OF SEQUENCES: 91 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: SmithKline Beecham Corporation 

(B) STREET: 709 Swedeland Road 

(C) CITY: King of Prussia 

(D) STATE: PA 

(E) COUNTRY: USA 

(F) ZIP: 19406-0939 

(v) COMPUTER READABLE FORM: 
• (A) MEDIUM TYPE: Diskette 

(B) COMPUTER: IBM Compatible 

(C) OPERATING SYSTEM: DOS 

(D) SOFTWARE: FastSEQ for Windows Version 2.0 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 25-FEB-1997 

(C) CLASSIFICATION: 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 9604 04 5.6 

(B) FILING DATE: 26-FEB-1996 

(viii) ATTORNEY /AGENT INFORMATION: 
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(A) NAME: Gimrai, Edward R 

(B) REGISTRATION NUMBER: 38,891 

(C) REFERENCE/DOCKET NUMBER: GM50007 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: 610-270-4478 

(B) TELEFAX: 610-270-5090 

(C) TELEX: 

(2) INFORMATION FOR SEQ ID NO:l: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2111 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: Genomic cDNA 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 

CTAGGAGTAG TATTTGGTTC ATGATTGCCT AATTCAATCA CATCTTTACT TTGCTCTAAG 60 

TGCAAATCAC GCAATTGACC ATNTGGATCT CGTCTATCAT AGTCATAAAT ACGGTATGTC 120 

GTATCGGATG ATTGTTGTGT CTCTAAAATT AAAATACCCG AACCAATGGC ATGGACAGTG 180 

CCAGCAGGAA CATAATAAAA GTCACCGGGC TTAACAGGTA TACGTTTGAA AAGACTGCCA 240 

AATTCATGAT TATCAATCAT GTCGATTAAC GCCTGTTTAT TATGTGCATG GACGCCATAA 300 

TATAATTTCA GCACCTGGGC TGCATCTAAA TATACCAACA TTCTGTTTTA CCTAGTTCGC 360 

CTTCGTGTTT TAAAGCGTAG TCATCATCTG GATGAACTTG AACAGATAAT TTATCATTGG 420 

CATCTAATAC TTTAGTTAGC AGAGGGAAAC TATCTCGTGA ATCATTATCG AATAATTCAC 480 

GATGTTGTGA CCAAAGTTGA TCTAGGGTCA TATCCTTGTA TGGACCATTG ATAATTGTAT 540 

TAGGACCATT TGGATGTGCA GAAATTGCCC AGCATTCACC AGTTGTTTCA TTAGGGATAT 600 

CATAGTTAAA TGCTTTTAAT GCATGACCGC CCCAAATTCT GTCTTTAAAA ACGGGTTGTA 660 

AAAATAATGC CATAGTTAAA ACTCCTCTAT ATTTTCATTA ATAAGTTATA AATTTCTGTA 720 

GTACTGTTGG CATTAATTAG TGATTGGCGT GTCTCATCAT TCATTAACGC TTTAGATAAG 780 

CGCTGAAGTA TTTTTAAATG TGTATCCTGA CTGTTGTTTG GTACGGCAAT TAAGAATATC 840 

AATTGAGGTA GACTACCATC TAGACTGTCC CATTTAACAC CATGATTATT TTTCATAACA 900 

GCTACAATCG GTTGTTTTAC AACATCAGAC TTTGCATGTG GAATGGCCAC GTTCATGCCA 960 
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(2) INFORMATION FOR SEQ ID NO: 2: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
ACCCTCTGTA TCATGTTG 

(2) INFORMATION FOR SEQ ID NO: 3: 



1020 



1380 
1440 
1500 



ATAGCTGTCG TAGACTCCAT TTCACGTTCT AGTATTGCAT TTTTTAAATG CGATGTGTGC 

TCTACATAAC GGCAAATTTT AAGTTTATGA ATCAACATAT CAATTGCTTC GTTTCGAGAC 1080 

ATGTCGTGAT CAGTAATTAT CATAGTTTGT TGATCAAAAA CATGAGAAGG TTTATTGAGA 1140 

TGTGAATGTT TCGCTCGTGC CATCNACATT GTCAACCTCT GTATCATGTT GTGTAATATC 1200 

TGTATCATGA AGTTGCGTGT GTTGCGCTGG TGCATCTACT GCTATAACTG GTGTATTGCG 1260 

TNTTAATAAT AGTACAGTAG GCATTGTGAC AAGACTACCT ACTATCNCTC CAAAGATAAA 1320 
CCATAATACA TGATCAATAC CACCTAATAC AGCCACGATT GGACCTCCAT GTGCGACTCT 
ATCGCCGACA CCACCAATGN CTGCAATGAC TGATGCAATC ATTGCACCAA TGATGTTTGC 
AGGTATAATG CGCAATGGAT CTTGGGCTGC GAAAGGAATA GCACCTTCAG TAATNCCAAA 

TAGTCCCATA GTGAAGGNAG CCTTACCCAT TTCTCTTTCG GAATGATTGA ATTTATACTT 1560 

NTGAACANAC GTTGCTAAAC CTAAACCGAT TGGTGGTGTA CATACANCAA CTGCGACCAT 16 20 

ACCCATAACG GCGTAATTAC CTTCAGCAAT AAGTGCTGAG CCAAATAAAA ATGCTACCTT 1680 

GTTTAATTGG ACCGCCCATA TCGAAGGCGA TCATCGCACC TATAATCATC GACAAGTATA 1740 

ATAATATTAG CACCTTGCAT ACTTTTTAAC CAGGGTTGTT AGGAATGCCG CAAAAATATT 1800 

AGAAATCGTG CACCGATTAA AAATATAAAT ATCAATCCTA ACAACGACCG ATGAAATAAT 1860 

GGGAATAATA ATGATAGGCA TAATTGGTGC CATTGCTTTT GGAACTTTAA TATCTTTAAT 1920 

CCACTTTGCG ATATAACCTG CTAAGAAACC AGCAACAATA CCACCTAAAA ATCCTGCGCC 1980 

TGCATCACTG CCATAAAAAC TACCGTCAGC AGCGATAGCG CCGCCAATCA TACCAGGAAC 2040 

AAGACCGGGC TTGTCAGCGA TACTAACAGC GATATATCCA GCTCGTGCCG AATTCGGCAC 2100 
GAGCTCGTGC C 



2111 



18 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: Genomic cDNA 
<xi} SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
GTGCGATGAT CGCCTTGG 

(2) INFORMATION FOR SEQ ID NO:4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 809 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: Genomic cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 



CGGCTCTTCG TAATATTGAT AATGTGCAAT ATTTNAAGAA TAATCAATTT ATTGAAGAAG 60 

AAACCGTAGT GACCGTGAGC GAATATCGAA NCGGCTATTG ATAGAATACG TACTGAAATG 120 

GACCCGAATG AATATCGAAG NCGATATAAA TGGTAGACCT AAACATATTT ACAGTATTTA 180 

TCGGNAAATG ATGAAGCAGA AAAAACAATT TGATCAAATT TTTGATTTGT TGGCGATACG 240 

TGTTATTGTC AATTCTATTA ATGATTGTTA TGCGATACTT GGGTTGGTGC ATACGTTATG 300 

GAAACCGATG CCAGGACGTT TTAAAGATTA TATTGCAATG CCTAAACAAA ATTTGTATCA 360 

GTCATTGCAT ACTACAGTAG TAGGTCCAAA TGGAGACCCG CTCGAAATCC AAATACGAAC 4 20 

GTTTGATATG CACGAAATTG CTGAGCATGG TGTTGCAGCA CACTGGGCTT ACAAAGAAGG 4 80 

TAAAAAAGTA AGTGAAAAAG ATCAAACTTA TCAAAATAAG TTAAATTGGT TAAAAGAATT 54 0 

AGCTGAAGCG GATCATACAT CGTCTGACGC TCAAGAATTT ATGGAAACCT TATAATATGA 600 

CTTACAGAGT GACAAAGTAT ACGCATTTAC CCCAGGGAGT GATGTTATTG AGTNGGCATA 660 

TGGTGCTGTG CCGATTGGAT TTTGGCTTAT GCGAATCACA GGGAANGTAG GTAATAAGAT 720 

GATTGGCGCC CAGGTGGAAT GGCAAAATTG TACCANATTG ACTTATNTTT TCACAAAACA 780 
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GGCGGATATT GTTGGAAATA CCGTTCTAG 

(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5 
AGATACGTAC TGAAATGG 

(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
CCTGTGATTC GCATAAGC 

(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1090 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 
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(ii) 


MOLECULE TYPE: Genomic cDNA 




(xi) 


SEQUENCE DESCRIPTION: SEQ ID NO: 7: 




GTGATGTGGC 


TAAACGCTTA AATGCAAATA TATATGTGTC TGGCGAAGGT GAAGATGCAT 


60 


TAGGGTATAA 


AAATATGCCA TCAAAAACAC AATTTGTTAA ACATGGAGAT ATCATTCAAG 


120 


TAGGCAATGT 


TAAATTAGAA GTTCTGCATA CTCCAGGACA CACGCCTGAA AGTATTAGCT 


180 


TTTTACTCAC 


TGATTTAGGT GGTGGNTCAN GTGTTCCGAT GGGATTATTT AGTGGTGACT 


240 


TTATTTNTGN 


TGGTGATATA GGTAGACCTG ATTTATTAGA AAAATCTTGT TCAAATAAAG 


300 


GGTTCGGCAC 


GAAATTAGCG CGAAACAAAT GTATGAGTCC GATCAAAATA TTAAAAATTT 


360 


ACCAGACTAT 


GTTCAAATCT GGCCGGGTCA TGGTGCTGGA AGCCCTTGTG GTAAAGCATT 


420 


AGGTGCCATA 


CCTATATCTA CAATAGGTTA TGAGAAAATT AATAACTGGG CATTTAATGA 


480 


AATTGATGAG 


ACTAAATTTA TTGNNTCATT AACATCAAAT CAACCAGCAC CACCNCATCA 


540 


TTGTGCACAA 


ATGAAACAAG TTANTCAGTG TGGCATGAAT TTATNTCAAT CATATGATGT 


600 


TTATCCNAGC 


TTAGATNATA AGAGAGTAGC ATTTGATCTT CGCGT AG CAA AGAGGGCTTT 


660 


CACGGGTGGC 


CACACAAAAG GAACAATCAA TATACCATAC AACAAAAACT TTATTANTCA 


720 




GTACTTAGAT TNTGAAAAAG ATATAGATTT AATTGGAGAT AAATCTACTG 


780 


TTGAGAAAAG 


CGAAACACAC TTTACAATTA ATTGGGTTTG ATAAGGTAGC AGGCTATCGT 


840 


NTGCCAAAAT 


CAGGCATTTC ACCCCAGTCC GNTCATAGCG CTGATATGAC AGGTAAAGAA 


900 


GAACATGTAT 


TAGACGTACG TAATGATGAA GAGTGGAATA ATGGACACTT AGNTCAAGCA 


960 


GTTAATATTC 


CACATGGTAA ATTATTAAAT GAAAATATTC CTTTTAATAA AGAGGATAAA 


1020 


ATATATGTAC 


ATTGTCAGTC AGGTGTTAGA AGNTCAATTG CAGTGGGGTA TATTGGGAAA 


1080 


GCAAAGGCTT 




1090 


(2) INFORMATION FOR SEQ ID NO: 8: 




(i) SEQUENCE CHARACTERISTICS: 




(A) 


LENGTH: 18 base pairs 




(B) 


TYPE: nucleic acid 




(C) 


STRANDEDNESS: single 




(D) 


TOPOLOGY: linear 





(ii) MOLECULE TYPE: Genomic cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
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TTCGGGTGTT TTACCTTC . 

(2) INFORMATION FOR SEQ ID NO: 9: 

(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
TGCAGCAAGC CTTTTCTC 

(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2247 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 



AGCAGAATCT 


TTTTTAGCAT 


GATCTGTCAT 


AATGATCATA 


CGCTCTGGAT 


TTAAATCAGC 


60 


TAAATGTTCA 


GTGTCTAATT 


GTAAGTAAGG 


TCCTTTCAAA 


TATTTACTTA 


AACCTTGTGT 


120 


TACATCGTCA 


CTTAATGCAT 


TTTTAAATCC 


TAGNTCGTTT 


AAAAATTGTC 


CAACATATGA 


180 


ATAGTGTGGA 


TGTGCTAATA AACCAGCTTT 


AGCAACTACT 


GCTGGAAGCA 


CTTTGTGATT 


240 


TCTATCAAAT 


TTAATTTCAT 


CTTTATACTT 


ATTGATTAAT 


TTATCATGCT 


CAGCAAGACG 


300 


TTTNNCGCCT 


TCTTTNTCTT 


TATTTAAAGC 


TTTAGCAATT 


GTTGTTGAAC 


GAATTAATAT 


360 


TGTGGGTGTA 


GTCTCCATCA 


AAACTCTTTA 


ATGATAATGT 


GGTGCAATGT 


GGGCTAATTC 


420 


TTTATTAATA 


CCCTTATGTC 


TACTGCTATC 


AGNGATAATT AATCCCGGNT TTAATTTACT 


480 


AATNTCTCTT 


AAGTTNGCTT 


GTTACGTGTA 


CCTACAGAAG 


TATTACCCCC 


AATTTTTCTC 


540 
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TTACTGGGTT 
CTATATGCTG 
CTTTAGGTAC 
GATTCTTCTT 
TATTAATCCC 
ATANCTTCAT 
TTTATATTTC 
CATTTCCTTA 
ATTGACACCA 
TTTATGTGAC 
GAACAATTAT 
TGATAAATAT 
AGCGTATCAA 
TGAGAAAGTG 
TGACCATTTA 
TCCATAGGTG 
TAAGAATGAT 
CAATGTAATC 
TCTAAACAAC 
AGGTGGGGGG 
TTCTCTTNCA 
ATCAGATTTA 
AATAATCTCT 
CCAATCTACC 
NTGATGTATG 
AAATCTCTTT 
TAATTATGTT 
CTACAATGGT 
TTTGAAAACT 



ATGATACG.TT 
NTAATGCAAC 
TTTTACTGTA 
TTACTTGAAT 
AATATACTAA 
TTATTATTTT 
TAAAATGTAT 
GACAAAATAC 
CTTAGCATGA 
GAACTGTTGC 
GATACAACCG 
ATTAATATTG 
TCTTTGAATC 
CTTCATAATT 
AATCAAGCAT 
AAATATTATC 
TCATGCGTGT 
GCATCTTGAT 
GTNTCANCTA 
NAACATGCTC 
ACAGCTGAGA 
AAACACATAA 
TTANTATAAT 
ATTAATAAAA 
GTAGATAATT 
TGNGGGAGGC 
ATCCAATCTG 
ACGGGCANCA 
NNGTGGAGGT 



TTTTCTTACC 
CTTGCAAATG 
CCATTTTCAT 
TATCCGTATT 
AAGTTTTTAG 
ATTGATAACA 
GACTATATAT 
TGATAATGTA 
CCGNTATCCC 
ACTTAATTTG 
TGCAAACGAT 
GTTTACCATA 
ATCAATATAT 
TAATGAAAAA 
TAAATGATCA 
GACTTGTGTA 
GTGTATTAAC 
ATAACATAGC 
TTCGGAANTC 
CGAATTACCA 
CGAATCGATT 
AATTGAATGC 
CTAAAACATC 
TCTTATGTTT 
CTGTGTGGAT 
GTACGCAATA 
GATTCTGCAA 
GCTATACACT 
ACTTGGG 



ATCATCAGCA 
AGTACTCTAA 
CTTTTACCCG 
ACCACAAGCT 
ACCTCTCATC 
ATTATCATTG 
TTCCTCTAAT 
TCATTGCTAT 
TGTAATTCAG 
ATAANTCAAC 
ATGTAGTATA 
GCAGGAGATT 
ACCTTATGTA 
GATATATGAT 
CTCGAAGCGC 
TCTNAAATAA 
TCGAGGTTTA 
GAATCGCTTG 
AATTTATTTT 
CCCGGAAATA 
AATCATAAAG 
TAAACCTACA 
ATAAGTCAGA 
TTTTCCTAAA 
ACTCATATCA 
ACGTATATGT 
CATGATTGAT 
TAATTACTGG 



ATACCAACTT 
TACAACGATA 
AAATAGTATC 
GCAACTAAAA 
NGTCCCACTC 
TCAAGTAGCG 
AATTATGACT 
ATCATCTTTG 
CTGATATTAT 
AANTACAANA 
ACTTGTCAAC 
TCACATCAAA 
AATTTTTCAT 
CTCCAACTTG 
CTAAAATATT 
CCAATATCTA 
ATTTCTAAAA 
ATTTGCGTTG 
TACCCAAATC 
ATTTNCANTC 
ATATCANCAC 
AAATGGATAT 
ACACCTTCAC 
ACTTCTGCTA 
ACTTTCCTCT 
TAAATCCTGA 
ACCTAACGCT 
TGTGANTNGN 



GGTNTAACGG 
CGTTGTGCAT 
TTTAGTTGAT 
GTAAGGCAAC 
CTTAATATGT 
TTCAATCTTT 
ACAATTAGCA 
CATTAATACA 
CTGTTGCAAT 
NATCTAAGTT 
TTAGAATTAT 
ATTTTGAAGT 
ATACATCGAA 
ATAGTGTCCT 
GATATGCTGA 
CAATAGCTTG 
TCTCAGCCTC 
TTTCAACAAC 
AATATATAAA 
GATATCCTAT 
CACTTGGCGC 
TTTNCAAGTG 
GGACATCTTT 
CTTCATTTAT 
ATCATATCTG 
TCTGCAATAC 
TTTAAGCTTN 
ATATTTTTAC 



600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1800 
1860 
1920 
1980 
2040 
2100 
2160 
2220 
2247 



(2) INFORMATION FOR SEQ ID NO: 11: 



(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: Genomic cDNA 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
TGTAAGTAAG GTCCTTTC u 
(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
TAATACTTCT GTAGGTAC !8 
(2) INFORMATION FOR SEQ ID NO: 13: 

(i> SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 1789 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 

GGCACGAGCG GCACGAGCGT GTTGTATCAA GATTTTGTAG GCAGTTTTAC AACGTCCGAT 60 

TCAGCAAGTT ATGCACAAGA TTTTAAATCT GAGGAAAACG CTAAAAAGAT TGCTGAAACT 120 

TTAAATCTTT TATATCAATT AACAGGCAAT CAAAACGGTG TGAAAGTTGT GAAAGAAGTT 180 

GTGGATAGAA CTGACTTGTC ATCTGATAAA TCAGTTGATA GCGAAACAAT GTAACTATAC 240 
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TAAGTTATGA 
TCAGAAAATA 
AGCAAACGAA 
TGAACTAAGA 
TTTAGAGCAA 
TGNGCAAAAA 
GGCGCTTGTA 
TTAAGGGAGG 
AAATGTAAAT 
AGGTACGTAC 
GAAGTGTTGT 
ATCTTTTTAT 
GATAATCATT 
TTTAGAGNTG 
CGTTGACNAC 
GATTTGCGAA 
TATTACNATG 
AT ACT AT ATT 
ATTNTGATTC 
CTTATTATGG 
GTGTAACGTT 
GGCTAGACGG 
TAACTGTTCA 
ATAACTCTGA 
CAGAACCTTC 
TAAGAATNTA 



GCATTACGCT 
AAAAAAGAGC 
ACAGCTGATC 
GAATCTCATA 
ACAAAGATAT 
AATGAAGAAA 
ACAATTGTTA 
TGGACATAAT 
AATGTTTTTG 
ATGGTATTAC 
TAATACCAAG 
TATAGACACA 
TCATAATTAT 
ACCANNATGA 
AAGTCCCANT 
AAAAGTCTGA 
NNANAGCTAN 
GTTTANAGGC 
NNAGGATATT 
TTATCAATGT 
ACATGATAAT 
TAAACANAAT 
GGAGTTGGAT 
TGTTTTTGAT 
GGTTAATTAC 
TAGAGATAAT 



CATAGCTTTC 
TTTCTGAATT 
ATGTGCTTGA 
AAGAACTTAA 
TAAACAGAAT 
AGACACTCGC 
TGATTGCAGT 
GAGTTGGGCA 
GTCAGTGCAT 
ACAGCTCACA 
TAAGTAGGAT 
TATAAAAAGT 
TGTATATAAC 
NNNANNCCAG 
TGTAAATGGT 
ATTCCAGGGN 
AACTGAAAAT 
TTTTTTACAG 
GTTNATAAAA 
GCGGGTGGTA 
AATCGATTGA 
ACAGTACCTT 
CTTCAAGCAA 
GGGAAGGTTC 
GATTAATTTG 
AAAACGATTA 



TTAGAAAGTA 
AGTTGAACGT 
ACTTAGAGAG 
AGATAAGCAA 
TGAAGAAAGA 
CCAAAATAAA 
CATTACTGCA 
AGATGGTTAT 
CGGCACTGGC 
AGACAGGAAG 
ATCT.GANATG 
GTATAGTAAT 
TAAATAACTA 
CATTTACATT 
AGCGAGAAAA 
ACAGCTTTAG 
AAAGAGAGTC 
ATCATTCGTG 
ATAAAGGGNA 
CACCACACAA 
CCGAAGAGAA 
TGGAAACGGT 
GACGTTATTT 
AGAGGGGATT 
GTGCTCAAGG 
ACTCTGAAAA 



GGTGTAGTTT 
GTTGATGATG 
GAACATAAGC 
GATAAAGTTG 
TATCANACGC 
TGGCTCGTAG 
TCAATTNCTG 
CATGTTATTT 
TTTTTATTTT 
CATACTCCAA 
TATAATAGAG 
ATATGTATGT 
CTTAACANAA 
ACTTTTATTC 
GCGNAGNAAT 
NCAATCTTAN 
CNCGACCACA 
GTATANCGAT 
AANAGTAGAC 
AACAGCTTGT 
AAAAGTGCCG 
TAAAACGAAT 
ACAGGAAAAA 
AATCGTGTTT 
ACAGTATTCA 
CNTGCGTAG 



TGGATGATAT 
TTGAAATACT 
AACATCATAA 
TAGATGAGAA 
AAGTAGNTGT 
GTGCCATATG 
CGTTATTACC 
GTNTGGTCGT 
GATTGAAAAG 
GTGAAGTTGG 
TAAAAATGAA 
ATAATTAAAT 
ATAATTATGC 
ATTGCCCTNA 
AAATGCGAAA 
NCANATCTAT 
TTTTTACAGC 
TTATTAGTAG 
TTGTATGGTG 
ATGTATGGTG 
ATCAATTTAT 
AAGAAAAATG 
TATAATTTAT 
CATACTTCTA 
NATACACTAT 



300 
360 
420 
480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 
1680 
1740 
1789 



(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: Genomic cDNA 
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(xi) SEQUENCE. DESCRIPTION; SEQ ID NO: 14: 
ATCCCCTCTG AACCTTCC 

(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 18 base pairs 
(B> TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
AAATGGTAGC GAGAAAAG 

(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 3797 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 

TCAAATGCAG TCAGGGAAGC AATAGGACGA TATGCATAAA GGAGATGGTA AAGTGGAACA 60 

GTGACAGAAG GTAAAGACAC GCTTCAATCA TCGGAGNCAT CAATCAANCA CAAAATAGTA 120 

AAACAATCAG GAACGCAAAA TGATAATCAA GTAAAGCAAG ATTCTGGAAC GACAAGGTTC 180 

TAAACAGTCA CACCAAAATA ATGCGACTAA TAATACTGAA CGTCAAAATG ATCAGGTTCA 240 

AAATACCCAT CATGCTGAAC GTAATGGATC ACAATCGACA ACGTCACAAT CGAATGATGT 300 

TGATAAATCA CAACCATCCA TTCCGGCACA AAAGGTATTA CCCAATCATG ATAAAGCAGC 360 

ACCAACTTCA ACTACACCCC CGTCTAATGA TAAAACTGCA CCTAAATCAA CAAAAGCACA 420 
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AGATGCAACC ACGGACAAAC ATCCAAATCA ACAAGATACA CATCAACCCG CGTGCCTCAA 
ATCATAGATG CAAAGCAAGA TGATACTGTT CGCCAAAGTG AACAGAAACC ACAAGTTGGC 
GATTTAAGTA AACATATCGA TGGTCAAAAT TCCCCAGAGA AACCGACAGA TAAAAATACT 
GATAATAAAC AACTAATCAA AGATGCGCTT CAAGCGCCTA AAACACGTTC GACTACAAAT 
GCAGCAGCAG ATGCTAAAAA GGTTCGACCA CTTAAAGCGA ATCAAGTACA ACCACTTAAC 
AAATATCCAG TTGTTTTTGT ACATGGATTT TTAGGATTAG TAGGCGATAA TGCACCTGCT 
TTATATCCAA ATTATTGGGG TGGAAATAAA TTTAAAGTTA TCGAGGGAAT TGAGAAAGCA 
AGGCTATAAT GTACATCAAG CAAGTGTAAG TGCATTTGGT AGTAACTATG ATCGCGCTGT 
AGAACTTTAT TATTACATTA AAGGTGGTCA CGAGCGTAGA TTATGGCGCA GCACATGCAG 
CTAAATACGG ACATGAGCGC TATGGTAAGA CTTATAAAGG AATCATGCCT AATTGGGAAC 
CTGGTAAAAA GGTACATCTT GTAGGGCATA GTATGGGTGG TCAAACAATT CGTTTAATGG 
AAGAGTTTTT AAGAAATGGT AACAAAGAAG AAATTGCCTA TCATAAAGCG CATGGTGGAG 
AAATATCACC ATTATTCACT GGTGGTCATA ACAATATGGT TGCATCAATC ACAACATTAG 
CAACACCACA TAATGGTTCA CAAGCAGCTG ATAAGTTTGG AAATAGAGAA GCTGTTAGAA 
AAATCATGTT CGCTTTAAAT CGATTTATGG GTAACAAGTA TTCCGAATAT CGATTTAGGA 
TTAACGCAAT GGGGCTTTAA ACAATTACCA AATGAGAGTT ACATTGACTA TATTAAAACG 
CGTTAGTAAA AGCAAAATTT GGACATCAGA CGATAATGCT GCCTATGATT TAACGTTAGA 
TGGCTCTGCA AAATTGAACA ACATGACAAG TATGAATCCT AATATTACGT ATACGACTTA 
TACAGGTGTG TCTTCACATA CTGGTCCATT AGGGCACGAA AATCCTGCCG AATTAGGCAC 
GAGACATTTT TCTTAATGGA TACAACGAGT AGAATTATTG GTCATGATGC AAGAGAAGAA 
TGGCGTAAAA ATGATGGTGT CGTACCAGTG ATTTCGTCGT TACATCCATC CAATCAACCA 
TTTATTAATG TTACGAATGA TGAACCTGCC ACACGCAGAG GTATCTGGCA AGTTAAACCA 
ATCATACAAG GATGGGATCA TGTCGATTTT ATCGGTGTGG ACTTCCTGGA TTTCAACACC 
GTAAGGTGCA GAACTTGCCA ACTTCTATAC AGGTATAATA AATGACTTGT TGCGTGTGGA 
AGCGNCTGAA AGTAAAGGAA CACAATTGAA AGCAAGTTAA ATTCATCTTC TGAATTTAAT 
AGGCTATGTA AATCGTGCTG TTATCATGGC ACATCAGATA TAAGTAGCAT CACAGTGTTG 
AATCTCAAAA TAGTAAAGTG AAATAAAGCG CCTGTCTCAT TAGCGAAAAC TAAAGGGACA 
GGCGTATCTG TTTATGAGCT TAATAAATTG TATGAATAAT ATGGTTGATC GAATAACTGT 
TTATCATTGA TGATAAATTT GAGTTTTTTA AAAATAATTG ATATATTACA CCATTGTTAT 
AGCGTTTAAA GAAATCAACC CAACTTTACG ATAAATAGTG ATTGCTTCGT CATTAGGTCT 
ACGATCAAAA TCATGCTCGT TTTTATTCAC GCGTTCAAAT GTTGAATGTG GAACATGATT 
CATGATATGT TCGCTTTCCT CAACGGGAAC ATCATAATCG CCATTACAAT GCGCAATGAA 
AACAGGTGGA AGTGTTTTAA GNTCATCTGG TGCAATATTA TATTTTGAAT CAGTATAATC 
ANCAATGTTA ATCATATTTA TCCATTTACC TGTGCCACGT GCATAAACGT AGAGTAAAAA 
ACGTGTGCGA TTTGATCTTG ANCAACCGGT GTTGGTGAAG TGAGTTGTCC AATCATTGTT 
TCGTTTATGC TTTGAGCTAT TTTTGCGTAA TACCTATTAG TTGTTTTAAA AGGGTTCAGT 
GTTGATGCGA CTATAACCAT AAAAATCAAT AACACCATCA ATATCTCTGT CTCGTGCAAT 
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TAATAAGACT TAAATATGCA CCTGATGATC TGCCAAAGGT AAAAATAGGG CAATTAGAAT 2700 

ATTGTGATTG AATCGCATCG AATGATGCGT AGACATCCTC AATAATGCAA TCGAGACTTA 2760 

CTTCTGGTAA TAAACGATAA CTTAGTTGAA TTAAATCGTA ATGTTCCGTA AGGATATCGA 2820 

TATACTGTGG GGATAAATCG TTAGCTTTAC CGAACATTAA TCCACCACCG TGGATGTAGA 2880 

CAATAACGCC TTTTGTTGGT TGATTTTTTG CTTTAATAAT TGTGTAAGGT AATGCAAATG 294 0 

CATCTTTAGT AATTACTTTA TATTTAATTT CAGTCACGAT TTAATAGGCT CCTTAGGAAT 3000 

CCGATATTGA TGTCATTATA ACACTGTCNT NAATTTCCAT GNAAAATAGT CTTAAGACGA 3060 

TGAGTCATGA TAATTCTGTT CCAATTGACG TAAAGCGTCN CGGGTATGCT TCTTTAGACC 3120 

TTCCCCATAA TCCATCATTT TAACAATATC TTTAAAAGCA GCATGTGGNA TGGCTAAATC 3180 

TTCTAAATCT GCCATAGAAA ATTCAAGATT GATATCATGT GGTCGCTGTT CAGCAAGTTT 3240 

ATGCACAAAG TCAGGTTCTG TGACCAAAGG CGAAGACATG CCGACCATAT CTGCATGTTG 3300 

TAAAGCATCT AAAGCAGACT CTGGAGAATT AATCCCGCCA CTTGCAATTA AAGGGATACG 3360 

ACCTGCTAAA TGTTCATAGA CAATTTGGTT AACTGGTCGA CCGAAATGAT CACCTGGTGT 3420 

ACGAGACGTA TTTTGATAAA TATGTCGACC CCAGCTAGCG ATTGCTAAGT ATTGGATGTT 34 80 

TGAAACGTCC ATGACCCAAT CGATTAATTG GTTGAACTCG TCAATGGTAT ATCCTAAATC 3540 

ACTGCCTCTG GTTTCTTCTG GCGTTGCTCG AAATCCTAAA ATAAAATTGT CAGGTGCTTC 3600 

TTTATCAATC ACTTCTTGTA CCGCACGCAT AACTTCTAAA CATAATCTTG CACGATTTTT 3660 

TAATGAGTCG GCACCGTAAT GGTCTGTACG TCTATTTGAA AAAGTTGAGA AAAATGTTTG 3720 

AATCAGCAAA CGTTGTGCAA TCGAAATTTC CACACCATCA AAACCTGCTT TAATCGCGCG 3780 

TGCATCGAGC TCGTGCC 3797 

(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
GACTAATAAT ACTGAACG 18 
(2) INFORMATION FOR SEQ ID NO: 18: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: Genomic cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 



TCTGTCGGTT TCTCTGGG 

18 



(2) INFORMATION FOR SEQ ID NO: 19: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1422 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: Genomic cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 

CAGGCGTTTC CTCNGGTACN TGTTGCNNGC CTTTAATTAC CGACNCTGCA ATANCCAAAC 
CGACCAGGTC GGATAGGGNA TATGTACCTG TTTTAGGACG ACCAATCGCT TGCCCAGTTA 
AAGCATCCAC ATCTACNATG CTTANCTTGT GTTGCTCGGC GCGATACAGA ATATCATTCA 
TTGTGTGCGT GCCGACTCTA TTTGCGACAA AGCCAGGCAC ATCATTGACG ACAATGACAC 
CTTTACCTAA TACATTGTGC GCGAAATTTT TTACATCTAA TATGATAGAT TCCTTCGTGT 
GTGACGTAGG TATTAACTCC ACTAATTNCA TAATACGTGG TGGGTTAAAG AAATGTAGAC 
CAAAGAATCG CTCTTGATCC TTCTCGTTAA ATGCTTGAGC AATCGCATTA ATTGGGATTA 
CCTGATGTAT TTGTAGCAAA TAAAGCATCT TCTNTAGCAT GTTGTAGAAC TTGTTGCCAA 
ACAGCATGCT TAATTTCAAT ATCTTCTTTG ACTGCTTCGA TATATAAATC AGNATCATCA 
TTTACCAAGT CATCATCAAA ATTACCATAT GTTAAATGAC TCACTAGATT TAAGTCGAAT 
AGTAGCGGCC GTTTCTTATC TGTAATTTTA TCGTAAGATT TTTTCGCAAT GAGATTTGGA 
TCGTTTGTGT CCACTACAAT ATCTAATAGT TTTACTTTAA GTCCAGCATN CACAAAGAGT 
GCTGCCAGTT GAGCGCCCAT CGTGCCTGCG CCAAGAACGG TTACTTTATT AATTGTCATA 
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GTGATTCCTC CAATTTAGGT GAGGATAAGA TAACCATTAA GATAATTGGA ATAACGNTGC 840 

TATTTTATNA AATTAATTAA GTATCTTTGA CAAGACATCT CAGNCTCTTT ATTTTAAGGA 900 

AAAAGCTTTA TGCTTAAAAT AAGTCTTTTT TAGTGAAATT AATGCATCTC ATATAATTAT 960 

TTGCTATTTA TACGAAAGCA GAATCTCCAG TCAAAGCGCG TCCAATTACT AAGGCATTAA 1020 

TTTCATGTGT ACCTTCGTAC GTGTAAATCG CTTCTGCATC AGAGAAGAAA CGTGCAATAT 1080 

CATAATCGTC AGCTAGTATG CCATTACCAC CTGTAATACC GCGGCCCATA GCTACTGTCT 1140 

CACGCAAACG TAAGGCATTC ATCATCTTCG CCGGTGAAGT TGCAACCTCG TCATATTCAC 1200 

CATGTGCTTG CATATTAGCT AATTGAGCAC ATGTTGCCAT TGCTTGAGCT AAATTACCTT 12 60 

GCATCATTGC TAGCTTNTCT TGTATTAACT GATATTTACT AATTGGGTNT GCCGAATTGC 1320 

TTACGCTCAA GTGACATAAT CTAATGTGGC ACGTAAAGCG CCAGCCATAC CACCTGTAGC 1380 

CATATAAGCA ACGCCTGCTC TCCGGTGGAA TAAAGAATTT TG 1422 



(2) INFORMATION FOR SEQ ID NO: 20: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: Genomic cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 



ATGTACCTGT TTTAGGAC 



(2) INFORMATION FOR SEQ ID NO: 21: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: Genomic cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:21: 
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GAGTCATTTA ACATATGG u 
(2) INFORMATION FOR SEQ ID NO: 22: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 811 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:22: 

ATACTTTGAT TTTAGATGAA GCTGATGAAA TGATGAATAT GGGATTCATC GATGATATGA 60 

GATTTATTAT GGATAAAATT CCAGCAGTAC AACGTCAAAC AATGTTGTTC TCAGCTACAA 120 

TGCCTAAAGC AATCCAAGCT TTAGTACAAC AATTTATGAA ATCACCAAAA ATCATTAAGA 180 

CAATGAATAA TGAAATGTCT GATCCACAAA TCGAAGAATT CTATACAATT GTTAAAGAAT 240 

TAGAGAAATT TGATACATTT ACAAATTTCC TAGATGTTCA TCAACCTGAA TTAGCAATCG 300 

TATTCGGACG TACAAAACGT CGTGTTGATG AATTAACAAG TGCTTTGATT TCTAAAGGAT 360 

ATAAAGCTGA AGGCTTACAT GGTGATATTA CACAAGCGAA ACGTTTAGAA GTATTAAAGA 420 

AATTTAAAAA TGACCAAATT AATATTTTAG TCGCTACTGA TGTAGCAGCA AGAGGACTAG 4 80 

ATATTTCTGG TGTGAGTCAT GTTTATAACT TTGATATACC TCAAGATACT GAAAGCTATA 540 

CACACCGTAT TGGTCGTACG GGTCGGTGCT GGTAAAGAAG GTATCGCTTG TAACGTTTGG 600 

TTAATCCAAT CGAAATGGAT TATATCAAGA CAAATTGAAG ATGCAAACGG GTAGAAAAAT 660 

GAGTGACTCC GCCACCTCAT CGGTAAGAAG TACTTCCAAG CACGTGAGGA TGACATCAAA 7 20 

GGAAAAGGTG GAAACTGGAT GTCTTTAAGA GTCAAGAATC ACGCTGGAAA CGCATTCTTC 7 80 

AGAGGTGGGT AAATTGAATT TTACGATGTG G 811 

(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 18 base pairs 
(B> TYPE: nucleic acid 
(C) STRANDEDNESS: single 
(0) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: Genomic cDNA 
{xi) SEQUENCE DESCRIPTION: SEQ ID NO:23: 
GATGAAGCTG ATGAAATG 

(2) INFORMATION FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
TATCTAGTCC TCTTGCTG 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 960 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 

TAATTCGCAA TAGGAGTGAT GAATATCATA AATTTTACCC TCCAAATGAA GCTAATGAAG 
TCCTGGACCC GAGTAAGACG CATGTAGCCA AGCTAAAATA ATCCACTCTA CCTTATCTTT 
AGTTAATAAT GTTACTAAAT GTTGTTCATA CGCTGCTTTT GAATCAAATT GTTTTGGTTC 
ATTAATATAA ACAGGAATAT CGTGCTTGTT TGCTCTATCT ATACAAAACG CATTTTGATG 
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ATCCGTATAT AGCNCCGTAA CTTCAATATT TTCAAGTTTT CCTGATTCAA CATGCTCAAC 
TATATTTTCA AAGTTACTTC CTGAACCTGA TGCAAAAATC GCAATTTTAA CCATTGTTAT 
ACCCCCAACA ATTCAATTGC AGTTGACTCA TTTTTCACAA TATGACCAAT TTGATAAGCT 
TCCACATTTT GTTCTGCTAA AATCTTCAAA GCGCGTCGAT GCATCTTTTT CATCAACGAT 
AACCGTATAG CCAATACCCA TGTTAAAAAT GTTATACATT TCATTTGTGT CTATATTGCC 
TTGTTGTTGT AACCAATCAA ATATTTTTGG CGTTGGAAAT GATGTAGTAT CAATTCTAGC 
AGCATATCCG GCTGGCAATG CACGTGGAAT ATTTTCATAA AAACCTCCAC CAGTAATATG 
ATTCATTGCC TTAATAGAAA CTTCTTTTTT TAAAGCAAGT ACAGGTNTGA CATATAATTT 
AGTTGGCTCT AAAAAGACAT CTATAAATGG ACGATTATCG NAGGGTGATG CCAAATCAAT 
GNCTGATTCA NTAATTAATN TGCGCACTAA ACTGTNTCCA TTNGANTGAA TGNCACTTGG 
ACGCAAGTCC TATAACAACT TGGCCCTCTT NCAATTCTTG AACCATCTTA CAATAGNCAA 900 
CCTTTTTCAA CTGCTCCAAC AGCAAATCCG GCTACATCAT ATTCACCTTC GTGATACATT 960 

(2) INFORMATION FOR SEQ ID NO: 26; 



300 
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420 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: Genomic cDNA 
(xi> SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
ATAAGCTTCC ACATTTTG 

18 

(2J INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH : 18 base pairs 
<B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE; Genomic cDNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:27: 
GATAATCGTC C ATT TATA 

(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 541 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO;28: 



GGCACGAGCG 


CTAAATAATT 


AATATTTAGT 


TTTTAAGTTA 


TTAATAACGT AGGGATATTA 


60 


ATTTTAAAAG 


AAGCAGACAA 


AATGGTGTTT 


GCTTCTTTTT 


TATGTCGTAT AAGTAATAAA 


120 


TAAAACAGTT 


TGATTTTAAA 


ATGAAAGCGT 


AAAAATGGTA 


AAATATCCCA AAATTGATTG 


180 


TGATATAATT 


ATAAGGAAAA 


TGAGCAATTT 


ATGAAAAAAG 


TTTACGNACA AATCGGAGAA 


240 


TTAAAACTAA 


ATAATTATCA 


AAACAACGTC 


AATATTTAGT 


TGAATACTCA GACTTTAGCC 


300 


CATGGCCAAG 


TGGGGAAGAC 


AGCATATATT 


AGTAAAGGTG 


AATGATTTGT TATTACTCAC 


360 


TCGAAAATAG 


AAAGACAAGA 


TTTTAACGAT 


TAAAATAAAC 


TATTTTACAA ATAAAGTAAA 


420 


ATTAATTTAT 


TANGCTAATA ATGCAAAAAA 


TTAAAAAGTA 


ATGGACAAAG AGATAATGAT 


480 


ATGGCTCAAG 


AGGTAATAAA 


ATAGAGGTGG 


ACGCACACTA 


AATGGGGAAG TTAATACAAG 


540 


G 










541 



(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:29: 
GCACGAGCGC TAAATTTG 

(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 18 base pai rs 

(B) TYPE: nucleic acid 

(C) STRANOEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:30: 
CTTCCCCATT TAGTGTGC 

(2) INFORMATION FOR SEQ ID NO: 31; 

(i) SEQUENCE CHARACTERISTICS: 
<A> LENGTH: 2334 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 

= ~ ~ — c ™ c - 

GAATGAGGCA ATTTTAAAAC GAGCGATAAA TGGAAGTTAA »° 

CGATAGAAGA TCCGGATTTG AAAAAGAATC ATCWGAAAT "° 

AATCTAATCT CAAGTTCTGC AAAAGGTCAA O^rT " ""^ ^ 

AAATCAGGGT TATTAGCTAC ACCAAATGAA 2 3 °° 

GAGGAAACAA CACGCAAAT ZZZTc C^ ^ACGTGGCGT 3S0 

AGATTTAGTC GTTTCTTTAG ATTTCAGAAT GACAGCAACA < 20 
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CCTTTATATT CXGACATTGT TTTGCCAGCA GCGACTTGGT ATGAGAAGCA TGATTTGTCA 
TCTACAGATA TGCATCCATA TGTACATCCT TTTAATCCAG CTATTGATCC ATTATGGGAA 
TCGCGTTCAG ACTGGGATAT TTATAAAACG TTGGCAAAAG CATTTTCAGA AATGGCAAAA 

CAAGAAATTT CTGGAACGTT TAAAGATGTT GTGACAACTC ™~ TGATACAAAG 
CAAGAAATTT CAACACCATA CGGCGTAGTG AAAGATTGGT CGAAGGGTGA AATTGAAGCG 

GTACCTGGAC GTACAATGCC TAACTTTGCA ATTGTAGAAC GCGACTACAC TAAAATTTAC 

GACAAATATG TCACGCTTGG TCCTGTACTT GAAAAAGGGA AAGTTGGAGC ACATGGTGTA 

AGTTTCGGTG TCAGTGAACA ATATGAAGAA TTAAAAAGTA TGTTAGGTAC GTGGAGTGAT 

ACAAATGATG ATTCTGTGAG AGCGAATCGT CCGCGTATTG ATACAGCACG TAATGTAGCA 

GATGCAATAC TAAGTATTTC ATCTGCTACG AATGGTAAAT TATCACAAAA ATCATATGAA 

GATCTTGAAG AACAAACTGG AATGCCGTTA AAAGATATTT CTAGCGAACG TGCTGCTGAG 

AAAATTCGTT TTTAAATATA ACTtCACAAC CACGAGAAGT AATACCGACA GCAGTATTCC 

CAGGTTCAAA TAAACAAGGT CGACGATATT CACCATTTAC AACGAATATA GAACGTCTAG 

TACCTTTTAG AACATTAACA GGACGTCAAA GTTATTATGT GGATCACGAA GTTTTCCAAC 

AATTTGGGGA GAGCTTACCA GTATATAAAC CGACATTGCC GCCAATGGTA TTTGGGAATA 

GAGATAAGAA AATTAANGGT GGTACAGATG CTTTGGTACT GCGTTATTTA ACGCCTCATG 

GANAATGGAA TATACACTCA ATGTATCAAG ATAATAAGCA TATGTTGACA CTATTTAGAG 

GTGTCCACCG GTTTGGATAT CANATGAAGA TGCTGNAAAA CACGATATCC AAGATAATGA 

TTGGCTAGAA GTGTATANCC GTAATGGTGT TGTAACGGCA AGAGCAGTTA TTTCGCATCG 

TATGCCTAAA GGTACAATGT TTATGTATCA TGCACAAGAT AAACATATTC AAACGCCTGG 

GTCAGAAATT ACAGATACAC GTGGTGGTTC ACACAACGCG CCGACTAGAA TCCATTTGAA X680 

ACCAACACAA CTAGTCGGAG GATACGCACA AATTAGTTAT CACTTTAATT ATTATGGACC 17<0 



480 
540 
600 
660 
720 
780 
840 
900 
960 
1020 
1080 
1140 
1200 
1260 
1320 
1380 
1440 
1500 
1560 
1620 



AATTGGGAAC CAAAGGGATT TATATGTAGC AGTTAGAAAG ATGAAGGAGG TTAATTGGCT 1800 
TGAAGATTAA AGCGCAAGTT GCGATGGTAT TAAATTTAGA TAAATGCATA GGATGCCATA I860 
CGTGTAGTGT GACATGTAAA AACACTTGGA CAAATCGTCC AGGTGCTGAG TAACATGTGG 



TTCAATAACG TAGAAACGAA GCCAGGTGTA GGGTATCCGA AACGTTGGGA AGACCAAGAA 
CACTACAAAG GTGGTTGGGT ACTAAANTCG TAAAGGGAAA CTTGAATTAA AATCTGGAAG 
TAGAATTTCA CAAATTGCTT TAGGTAAAAT TTTTTATAAC CCAGATATNC CATTAATAAA 
AGATTATTAT GANCCATGGA NCTATAATTA TGAACATTTA ACAACTGCGA AATCAGGGAA 
GCATTCGCCA GTTGCTAGAG CGTATTCAGA AATTACAGGG GATAACATTG AAATTGAATG 
GGGACCTAAC TGGGAAGATG ACTTAGCAGG TGGTCATGTT ACAGGCCCAA AAGATCCTAA 
CATACACAAA ATAGAAGAAG AGATTAAATT CCAATTTGAC GAAACTTTTA TGAG 



1920 
1980 
2040 
2100 
2160 
2220 
Z280 
2^34 



(2) IN FORMAT I ON FOR SEQ ID NO:32: 

(i) SEQUENCE CHARACTERISTICS : 
(A) LENGTH: 18 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : Genomic cDNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:32: 
ATTGATCCAT TATGGGAA 

(2) INFORMATION FOR SEQ ID NO: 33: 

<i> SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 
(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 
CATATTGTTC ACTGACAC 

(2) INFORMATION FOR SEQ ID NO: 34 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 638 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 
AGTTATTGTA TTTAAAAATG TTTCATTTCA ATATCAAAGT GATGCATCCT TCACATTGAA SO 
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AGATGTTTCT TTTAATATAC CTAAAGGTCA GTGGACATCT ATTGTTGGTC ATAACGGTTC 
TGGAAAATCT ACAATTGNCA AGTTAATGAT TGGCATAGAG AAAGTTAAAT CTGGAGAAAT 
TTTTTATAAT AATCAAGCTA TAACTGATGA TAATTNTGAA AAGTTAAGAA AAGACATAGG 
AATTGTATNT CAGAATCCGG ATAATCAATN TGTTGGNTCA ATTGTAAAAT ACGATGTGGC 
ATTTGGACTC GAAAATCATG CGGNTCCACA TGACGAAATG CATAGAAGAG TCAGCGAAGC 
ACTTAAACAA GTTGATATGT TAGAACGTGC AGATTATGAC CCTAATGCAT TATCGGGGGG 
ACAGAAGCAG CGTGTGGCTA TAGCAAGTGT ATTAGCACTT AACCCTCTGT CAT TAT AT AG 
ATGAGGCGAC TCTATGTTAG GATCCCTGAT GCACGTCAAA TTTATGGGAT TTAGNGAGAA 
AGTAANTCAG ACATTATATA CAATCATTCT ATACGCATGA TTTATCTGAG GCGATGAGNA 
GATCAAGTAT CCGTATGATA AGGACTTNCT TTTAAGGC 

(2) INFORMATION FOR SEQ ID NO: 35: 



120 

180 

240 

300 

360 

420 

480 

540 

600 

638 



(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANOEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: Genomic cONA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:35: 



GTTTCATTTC AATATCAA 

(2) INFORMATION FOR SEQ ID NO: 36: 



U> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 36: 
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ATCTATATAA TGACAGAG 

18 

(2) INFORMATION FOR SEQ ID NO: 37: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 14 96 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: Genomic cDNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:37: 
GTTAATCAAG TATCGAAGCG GAACAATCAT ACTTTAATGT TGAAGATTTA TATNGCGAAC 

AAGCGATGGT CCTACTGCOT aatattaatt tagcactgcg cgcacaatat ttgttngnat 

CTNATGTCGA TTACTTTGTA TATNNTGGTG ATATTGTTTT AACTGACCNC ATTACAGGTC 

gtntgttacc GGNAACTAAG TTGCAAGCTG gacttcacca ngctattgaa cccaaZg 

GTATGGAGGT TTCAACAGAT AAAAGTGTTA TGCCAACCAA TTACCCTTCC AGAATTTATT 
TAAACTTTTT GAATCAATTT TCAGGTATGA CAAGCTACAG GAAAATTAGG CGAATCAGAG 
TTCTTTGATT TGTATTCANA AATAGTCGTA CAAGCACCCA ACTGATAAAG CGATTCAACG 
TATCGATGAA CCAGATAAAG TGTTTCGTTC AGTTGATGAG AAAAACATCG CGATGATTCA 
TTGA TATAGT TGAACTTCAT GANNCGGGGC CGACCGGTTT TACCTCATAA CCGAGNACTG 
CTGAAGCGGC TTGAATACTT TTCNGAAGTA TTATTCCAAA TGGATATTCC TAATAATTTA 
T CA TTGCGC ^ ^ CAGATGATAG CTGAAGCAGG COUI^ 

gtcZca" TTGCGACTAG TATGGCAGGT CGAGGCACAG ATA — ~- 

GTCGAAGCAT TAGCTGGATT AGCTGTTATT ATTCATCAa/- 

rmn^« ATTCATGAAC ATATGGAAAA TAGCCGTGTA 

iT " " CGT<:Gra TTC ™ TAGA ™"»™ «*=«CATC TTGTATATAT 

x^r ™ t " ,gntmgcm ~™ ™»™« «jz 

M^Z™ r" ,CMCMm ™™™ ™« TCGNAAAGTT 
AAGCAAATTG TAGTTAAAGC GCAGCGTATC Trrrnnnr-** ^ 

AAATGGrTT* „ w TCGGAAAGAA CAAGGGGTTA AAGCTCGGTG 

AAATGGCTTA ATTGAATTTG NNAAAAAGCA TNAGTATTPA r^n»™ 
rAMrr „^ NAGTATTCA GCGAAGATCT TNGTATTTAC 

GANGGAACGC AAATCCGAGT TTTTAGAAAT TAGATTGATr rrr^. 

BMrWj AGATTGATG CTGAGAATCC NAGATTTTTA 

ANGCGGTTAG CTTAAAGATT GTATTTGAAA TNGTTTGGrr Man^n 

Arillaft ^ I NGTTTGGGG NAATGANGGA AANGGTGCTA 

ACAAAATCGC GNGTTGGGCG AGTATATTTT ATPAMaw * 

ATCAAAAATT TAAGTTNCCA ATTTAATAAA 
GATGTGGCTT GTGTTAATTT TAAAGATAAr AATAAA 
TTTGAAAArr TAAAGATAAG CAAGCAGNAG TGACATTTTT ATTAGAGCAA 

TTTGAAAAGC AATTAGCTTT GGANTCCGTA AAAACATGCA ANGNGCATAT TATTATAATA 
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1260 
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TTKCCGGCCA AAANGTCTTT NGGGAAAGCA ATTGATNCAA GTTGGGGTTA GGAACAAGXC U<0 
GGCTTTTNAC AACAANTTAA NAGCAAGCGN TAATCAAACG ACAAAANTGG CAACCT 1496 

(2) INFORMATION FOR SEQ ID NO:38: 



(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 
<xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
CCGCTAAATT ACTATCGC 

18 

(2} INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:39: 
CTGAAGCGGC TTGAATAC 

18 

(2) INFORMATION FOR SEQ ID NO: 40: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 955 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE; Genomic cDNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4 0: 



ATATAAATTA TTTAAGCGTA TGGTTTTACT TCGATTGCAC CCTTCATTTT CATCATTGAA 
CACCATGCTT AATATAATCC ATATATTTGT GGCTCTAAAG NCTTTCCTCC CACCGTATAA 
TGTCTGCTGC TTTTTCAGCT AACATTAAAA CAGGTGCGTG TATATTGCCA TTTGTCGTAC 
GTGGCATAGC GGATGCATCA ACTACACGTA AATTTTCCAT ACCGTGGACT TTCATTGTTA 
ACGGGTCAAC TACTGCCATT GGATNCTGAA GCAGGACCCA TTTTAGCACN ACAAGATGGG 
TGTAATNCTG TTTCACCATC TCNACGGAAN NCAATCAAGN ATTTCTTCGT CTGTTTGCAC 
TTCTGGGTCC TGGGTGAAAT TTCTCCACCA TTGAATGGAT CCATTGCTTT TTGAGATAAG 
ATATTTCTTG CTACACGAAT TGCTTCTACC CATTCTNTTT TATCTTCTTC TGTTGATAAA 
TAATTAAAGC GGATACTTGG TTTTTCGAAT GGATCTTTAG ATTTGATTGG CACGAGCTAC 
CACGAGAGTT TGAATACATT GGTCCTACGT GAACTTGATA ACCATGTGCG ACCGCTGCCT 
TTTGACCATC ATATCTTACA NCTATTGGTA AGAAATGGAA CATTAAGTTA GGATAATCAA 
CTTCGTTATT TGAACGTACA AATCCGCCAC CTTCAAAATG GTTAGATGCT GCTGCACCTG 
TACGTGTGAA AATCCAGTGG TAAACCAATT AAATGGCATG CGCCTTGATA TCTAAGCTTG 
GCTGTAATGA TACAGGTTTC CTTACATTTA TGTTGAATGT ATACCTCTAA GTGATCTTCC 
AAAGTTTTCA CCCACACCTG GTAAATGAAC ACGTGGCTCA ATGCCTTTTG ATTTTAGGAA 
CTCTGAATCA CCGATACCAG ATAATTGTAG TAATTGTGGC GTTATTGAAT GCCCC 

(2) INFORMATION FOR SEQ ID NO: 41: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 18 base pairs 
fB) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 
(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 41: 
GAAGCAGGAC CCATTTTA 



60 
120 
180 
240 
300 
360 
420 
4 80 
540 
600 
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720 
780 
840 
900 
955 
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(2) INFORMATION FOR SEQ ID NO; 42: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: Genomic cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:42: 



GATTTTCACA CGTACAGG 

18 



(2) INFORMATION FOR SEQ ID NO: 43: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 97 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: Genomic cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 43: 



GAATTCCTAC ATAATACTTT TGTTTACCTT GTGTCAGTTT ATACAACGGT GGCTGTGCAA 
TATACACATA GCCTGCTTCA ATTAACGGTC TCATAAATCG ATAGAAGAAT GTTAATAACA 
ATGTTCTAAT ATGCGCTCCA TCCACATCGG CATCAGTCAT AATGACGATT TTGTGATATC 
TTGCTTTCGC TAGATCAAAG TCGCCACCGA TTCCTGTACC AAATGCTGTG ATCATTTGAC 
GAATTTCATT GTTATTCAAA ATTCTATCTA ATCGTGCTTT NTCAACATTT AATATCTTAC 
CTCGTAATGG TAAAATCGCC TGCGTTCTAG AGTCACGACA GATTTTGGTG GACCCCCNGC 
AGAGTCCCCT TCGACTAAGA AAATCTCACA TTCTTCAGGA CTTTTACTAG AGCAATCGGC 
TAATTTACTG GAAGACTGCT ACATCTACGC TGATTTACGA GGTGTTACTT CAGGGCTTTN 
TCGAGACACG TGCANGT 



60 
120 
180 
240 
300 
360 
420 
480 
497 



(2) INFORMATION FOR SEQ ID NO: 44: 

82 



WO 97/31114 



PCT/GB97/00524 



(1) SEQUENCE CHARACTERISTICS; 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 44: 
CATAATACTT TTGTTTACC 

19 

(2) INFORMATION FOR SEQ ID NO: 45: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO:45: 
AGTAACACCT CGTAAATC 

18 

(2) INFORMATION FOR SEQ ID NO: 46: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1443 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO:46: 

CTANCNAANG GAANTTCAGC ATCCTTAAAA ATACCTATTT GACTGTAGAA ACCTTTTGNT 60 

GCGTACAATA TCTAAACCTT GTCGTGCTGC TGGAACTGCA CCTGAACATT CAACAACAAC 120 

ATCTGCACCG TAACCGTCTG TAATTCCATT GATATACGTT TTTAAGTCTG TGTGTTGTAA 18 0 

ATTGACTACA TAATCCATGT GCAATGCTTC TGCTTTATCT AATCTGACTT NGTGGCANTG 240 

TCCAATCCAG TTACCACAAC AGGTGCGCCT TTACTTTTCA ACACTTGTGC TACAAGTAAT 300 

CCGATTGGCC CAGGTCCCAT TACAACTGCT ACATCGCCAG AGTTCACTTG AATCTTAGAA 360 

ACGCCATGAT GTGCACATGC TAATGGTTCT TGTCATAGCT GCAGACTGAT ACGATACTTC 420 

CGCTTCTGGA ATATGATNCA AACTTTCTTC ACGTGCAATG ACATAATTAG TAAATGCGCC 480 

ATCAACTTGT GTTCCAATAC CTTTTCGATG GTTGCATAAA TGATAGTTTT TTGATTTACA S40 

GGAATCACAC TCATTACANA CCATAGAATG TAGTTTCAGA AGTGACNCGG TCACCAACTT 600 

TAAAATCNTT AACGTCTGCT CCCAACTTCA ACGATNTCAC CAGAAAATTC ATGACCTAAT 660 

GTCACTGGAA AATTAACTTN ATAATGCCCT TCATAAGTAT GAAGGTCTGT GCCACAAATT 720 

CCTGCATAAT GTACTTTAAT CTTTACTTTA TCATCTAGCG GTGTTGCAAC TTCTTTATCA 780 

AGAAGTTCTA AGTTGCCATG TCCTTCTCTT GTTTTTACTA AAGCTTCCAC CACAAACACN 840 

TCGANTTTTT ANTTGNAATA GACTNNATAG NTTNAAGATA AGATAGTTAN CGATATTNCC 900 

ACCTTGATCA ATACTTGANA TTTCAGATGA ACCTTTTGNC ATTTGTACAT TCGTACCTTT 960 

CGCCATATCT GTGAAAATGG GTGCTACGTC TGTTGCAATA TATAATGAAA TTGCAATCAT 1020 

AATCGTACCC ACAATGACAG AATGAATAAT GTTTCCTCTT GCTGCACCAA CAATAAACGC 1080 

GACAACAAAT GGTATAGTTG CTAAGTCACC AAAAGGTAGT ACTTGGTTTC CTGGTAAAAT 1140 

AACGGCTAAT AAAACAGTGA TAGGTACTAA AATTAATGCT GTCGAAATAA CCGCTGGATG 1200 

ACCTAATGCT ACAGCCGCAT CCAATCCAAT ATAAATTTCA CGTTCGCCAA AACGTTTATT 1260 

TAGCCATGTT CTTGCAGACT CTGAAACTGG CATTAAACCT TCCATTAAGA TTTTTACCAT 1320 

TCTAGGCATT AAGACCATTA CTGCAGCCAT TGACATTCCT AAATTAATGA TGTCTCCAGG 1380 

TTTGTAACCT GCTAACACAC CAATACCTAA ACCTAAAATT AAGCCGACAA ATATAGACTC 1440 
TCC 

1443 

(2) INFORMATION FOR SEQ ID NO: 47: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: Genomic cDNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 47: 



GTTCTAAGTT GCCATGTC 



(2) INFORMATION FOR SEQ ID NO: 48: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 
(C> STRANDEDNESS : single 
(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: Genomic cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 48: 



CCTAGAATGG TAAAAATC 

18 

(2) INFORMATION FOR SEQ ID NO: 49: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1642 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: Genomic cDNA 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO: 49: 



CCATTTAAAA GTATTGTAAA ATCATCCACN TTNTATAAAC CAACCACNTT AACNTTTTTG 60 
ACATTTGTTA TCCGATGAGA TTAAAAGATA TCAATNAATA CAATTTTTAN AATTAATGTC 120 
ACTATGTTTT CCGATAATAT NACCCAATCA TCGNAATGTT ACCCATTTAT AAAATGANAA 180 
ATCNTTGACA TAGGTANAGG GAATGTATAT TGGTCNCGGA TCACTTAAAT TAAACCCANA 
TCATGTCATC TGGTAATGTN TCAATGTTAA TTGCTCCTGA AGCGGCGTAN ACTTTAATCT 
TCCATGTTAA ATGAGTAAAT TGATGCGTCA ACTCNAAAAT AGGTGTTTCT NCTGGNTGAA 
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TGTCATGACC GATTTTTTCA NTCATTTTAC GTCTANCATG CTCACTATCN AACATAGGAN 
ATTGCCACAT ACCATACNAT AATTNTTCCC TACGCTTTTG CAACAGATAT TGACCTTGAT 
TATTTCTAAT TAANAAGACG GATTGCTCAA TTACNTTTTT ACTTACATTT TTAGATTTAA 
CAGGTAACTT TTCAAATGGA CCTTTATCAA ATGCCTCACA GTTTTCTTGN ACTGGACNAA 
ATAAGCATAA TGGATTTTTT GGTGNACAAA TTAATGCCCC TAATTCCATC ATAGCTTGAT 
TAAACGTTCC AGCTTCTGTA GTAACATACG GTAACAATTC TTGTTCGTAC GATTTCCTCG 
TCGATTGTAA TTTAATATCT CGATAGTCAT CATTCAATCT AGACCATACG CGAAAAACAT 
TTCCGTCTAC AGTTGCTAGT GG TACATTAT ATGCAATGCT CATTACTGCA GCTTGTGTGT 
ATGGGCCAAC ACCTTTTAAC GCTTTAAATT GATCAGGATC TTTGGGAACT AAGCCTTCAT 
ATTTATCANA AACTTCTTTA ATCGCCGTAT GAAAATTTCG AGCTCTACTA TAATATCCTA 
AGCCTTCCCA ATACTTTAAC ACTTCATCTT CCGAAGCTTG ACTCAAAACT TCCACAGTTG 
GAAATCGGNC ACCAAAACGA TGATAATAGT CAATAACTGT TTTAACTTGT GTCTGTTGTA 1080 
ACATGACCTC ACTTAACCAA ATATAGTACG GATTGGTCGT TTGTCGCCAT GGCATTTCTC 1140 
TTTGATTTTC ATCAAACCAG TGTATCAAAT TTTCTTTAAA ACTAGACTGC TGATACATTT 
ATAAAACCCT TTCCTCACCA AAATTAATTG TCTTTACTCA TAATGTTTTT ATT GT AC ATT 
AAAATCATGG TTAGTATGTA AGTTAATTTA GTTATNTGCG AAATTGGATT ATAATAGTAT 
ATATAATATT ATGAAATGAG TGAACTGATA TGGACACTGC AACACATATC GCAATTGGGG 
TGGGCCTTAC AGCACTTGCA ACTCAAGATC CAGCAATGGC TTCTACGTTT GGTGCAACAG 
CTACAACCCT TATCGTTGGT TCATTAATTC CTGATGGGGA TANTGTNCTT AAATTANAGG 
ACANTGCAAC ATATATTTCG NATCATAGAG GNATNACGTC ATNCCATCCC CTCCCACAAN 1560 
NNTATGNCCA GTCNCNTTTA CANTTTNTAT NTNTTCACGT CACTNTNGCT GGTANGCATC 
CCNCCTCACG TATGGCTTGT GG 



420 
480 
540 
600 
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1020 
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(2) INFORMATION FOR SEQ ID NO: 50: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: Genomic cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:50: 
TCCTGAAGCG GCGTATAC 
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(2) INFORMATION FOR SEQ ID NO: 51: 

(i) SEQUENCE CHARACTERISTICS : 
(A) LENGTH: 18 base pairs 
<B) TYPE: nucleic acid 
IC) STRANDEDNESS : single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 
Ui) SEQUENCE DESCRIPTION: SEQ ID NO: 
TATGAAGGCT TAGTTCCC 



(2) INFORMATION FOR SEQ ID NO: 52: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 514 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 



18 
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(2) INFORMATION FOR SEQ ID NO: 
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(1) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS; single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5- 
GAGGTCATTG CAGCTTGC 

(2) INFORMATION FOR SEQ ID NO: 54: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 54: 
CAAATCCTAA GTTACTCATT 

(2) INFORMATION FOR SEQ ID NO: 55: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 479 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 
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Ui) SEQUENCE DESCRIPTION: SEQ ID NO: 55: 

CGCACATAAC GTGCAGCATA TGCAGCTGAG CGGTCTACTT TTTGTAGGAT CCTTACCACT 60 

GAAGCATCCG CCACCATGAC GTGCATAGCC ACCATACGTA TCAACAATGA TTTTACGTCC 120 

TGTTAATCCT GCATCACCTT GAGGTCCACC GATTACAAAG CGTCCTGTAG GATTGATGTA 180 

GAATTTAGTT TGTTCATTAA' TCAAGTTTTC TGGAACAGTT GGATAAATGA CATGCGCTTT 240 

GATGTCTTCT TGAATTTGTT CAAGTGTCAC ATCATCAGCA TGTTGTGTTG ATACGACAAT 300 

CGTATCAATA CGTACTGGGT TATCATTTTC ATCATATTCA ACAGTGACCT GAACTTTACC 360 

GTCTGGTCGT AAATAATTCA ACGTCTCGNG CCATCTTTTA CGCACATCAG ATTAAACGTT 420 

TGGGGCAATT GGGTGTGATA AATTAAATTG CTAGAGGGAT GTACGTTTCT TGTTTCAAT 479 

(2) INFORMATION FOR SEQ ID NO: 56: 

(i/ SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

Ui) MOLECULE TYPE: Genomic cDNA 
Ui) SEQUENCE DESCRIPTION: SEQ ID NO:56: 
ACGTGCATAG CCACCATA 18 
(2) INFORMATION FOR SEQ ID NO: 57: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 57: 
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ACAAGAAACG TACATCCC 

(2) INFORMATION FOR SEQ ID NO: 58: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 857 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 58: 

ACAACCCTNC AGTGCTTGGC CAATTAGGTA GAGAATTTNA CCTAGGTAAN TTAATGCGAT 60 

AAAGCCCAAG TTTGTAAAAT GTCCNTTGTG CGCCAATTTG TTCCTGTACN TANTGGGANC 120 

TATTTTAGGA TTCTTATCAG GGATATTTCC CAAGGGTTTT GTTGACNCCT TAATCATGCG 180 

TGCGTGTGAT GTTATGTTGG CAATTCCCCA AGTTATGTTG TAACGTTAGC ATTAATTTGC 240 

ATTGTTTGGA ATGGGTGCCG AAAATATTAT CATGGCATTT ATTTTGACGC GTTGGGCATG 300 

GTTCTGTCGT GTTATACGTA CAAGTGTTAT GCAGTACACT GCTTCTGACC ATGTCAGATT 360 

TGCTAAAACA ATCGGTATGA ATGATATGAA AATTATTCAC AAACATATTA TGCCGTTAAC 420 

ATTAGCAGAT ATTGCTATCA TCTCTAGTAG TTCGATGTGT TCAATGATCT TGCAAATATC 480 

TGGCTTTTCA TTTTTAGGAT TAGGTGTCAA AGCGCCTACT GCAGAGTGGG GCATGATGCT 540 

TAACGAAGCT AGAAAAGTGA TGTTTACACA TCCTGAAATG ATGTTTGNGC CAGGTATTGC 600 

CATAGGGATT ATAGTGATGG CATTTAACTT CTTATCCGAT GCTTTACAAA ATTGNTATTG 660 

GATCCCCCGC ATCTCTTTCT TAAAGATAAA CTTCCGCNCC TTGTGAAAAA AGGGAGTGGN 720 

GCAATCATGA CATTGTTAAC AAGCTAAGCA TTTGGCGATT ACAGATACCT GGACAGATCA 780 

ACCACCGTGA GTGATGTGAN TTTNNCAATT AACTAAGGGG TGAAACTCTA GGCNTTATTG 840 

GGGAAAGTGG TAGCGGT 857 

(2) INFORMATION FOR SEQ ID NO: 59: 

(i> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

( B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: Genomic cDNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 59: 
ATATTATCAT GGCATTTA 

. 18 

(2) INFORMATION FOR SEQ ID NO: 60: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
CD) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 60: 

ATCTTTAAGA AAGAGATG 

18 

(2) INFORMATION FOR SEQ ID NO: 61: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 593 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 

(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 61: 

GAATTCTTGC ACATGTTGCT CGGTGTCTTC CTTGCTGCAC TTGTATCATT CGTTGTAGCT 60 

GCTTTAATTA TGAAGTTCAC TAGAGAACCA AAGCAGGATT TAGAAGCTGC GACAGCTCAA 120 

ATGGAAAATA CTAAAGGGAA AAAATCAAGC GTTGCTTCTA AGTTAGTATC TTCTGATAAA 180 

AATGTTAATA CAGAAGAAAA TGCTAGTGGT AATGTTAGTG AAACATCTTC ATCAGATGAT 240 
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GATCCTGAAG CGCTATTGGA TAATTACAAC ACTGAAGATG TTGATGCACA CAATTACAAT 300 

AATATAAATC ATGTTATTTT TGGCTGCGAT GCGGGTATGG GTTCTTNGGT GCAAATGGGG 360 

TGCAAGCATT GTTACNGTNA TTAAATTTTA AAAAGGCGGC AATTAATGAT ATTACAAGGG 420 

TACAAATTAC TGCGAATTAA TCAAATTGCC AAAAGATGCT CCAATTANGN TATCAACTCC 4 BO 

AGAAAAACTA CTTGATCCGG GCTATTAACA AACACAATGC CATCCATATT CNAAGGGGNT 54 0 

TAATTTCCTA ATCACCAAGA TATGNAGGAC TTTTAATTAT CTTAAAAAGG TGG 593 

(2) INFORMATION FOR SEQ ID NO: 62: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 18 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 

Ui> SEQUENCE DESCRIPTION: SEQ ID NO: 62: 
TGCACATGTT GCTCGGTG 18 
(2) INFORMATION FOR SEQ ID NO: 63: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 63: 
GTGGTAATGT TAGTGAAAC 19 
(2) INFORMATION FOR SEQ ID NO: 64: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 425 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: Genomic cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 64: 

GGCACGAGCG AGTTCATTAG CTATATATAA GCCTAATCCA GAACCACCCG TTTTTGTATT 
ACGAGAGTTT TCTACTCTGA ATGTACGTTC GAATATACGT TCTTGTAGTT CTGGTATAAT 
GCCAATACCT CNATCGCTAA TAGCAATGTC GATAGTATCT TGATCTTTGT TTTCACTAAT 
ATTAATATCA ATGCGACTAC CAACATTTGA AAATTTTAGC GCATTATCAA GTAAGTTTGT 
TAAAATACGC TCAAGTGGCG TTCGATATTG ATAAAATGCA TCAATTTCGC TACAGAAATT 
CACTTCTAAT GTGCGGTTTT CATGTTTGAT ACGTTGCTCC ATATGGTTGC AATATTGATA 
CAAGTAATTG GTCTAGTTGT ATTAATTCTG GGGGATATGT TTTACCTGTA TTTAAAGTTG 
ATAAT 

(2) INFORMATION FOR SEQ ID NO: 65: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: Genomic cDNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 65: 



TATAAGCCTA ATCCAGAACC 

20 

(2) INFORMATION FOR SEQ ID NO: 66: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
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(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 66: 
AACGTATCAA ACATGAAAAC 

(2) INFORMATION FOR SEQ ID NO: 67: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 4 65 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 67: 

GTACGAGCTC GTGCCGGCAC GAGCGATTGG TGCAGTGAGT TATGTTTTAG AACAATTAGA 
TGCACCAGTA TATGGATCTA AATTGACAAT AGCGTTAATT AAAGAAAATA TGAAAGCCCG 
TAATATTGAT AAAAAAGTTC GCTACTACAC AGTTAACAAT GATTCAATTA TGAGATTCAA 
AAACGTGAAT ATTAGTTTCT TTAATACGAC ACACAGTATT CCTGATAGTT TAGGTGTCTG 
TATTCACCCT TCATATGGTG CCATTGTGTA TACAGGTGAA TTTAAGTTTG ACCAAAGTTT 
ACATGGACAT TATGCACCAG ATATTAAACG TATGGCAGAG ATTGGTGAAG AAGGCGTATT 
TGTCTTAATC AGTGATTCTA CTGAGGCAGA GAAACCTGGA TATAATACTC CCGGAAAATG 
TAATTGAACA TCATATGTAT GATGCCTTTG CCAAAGTGCG AGGTC 

(2) INFORMATION FOR SEQ ID NO: 66: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

94 
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21 



(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: Genomic cDNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 68: 
TTTAGAACAA TTAGATGCAC C 

(2) INFORMATION FOR SEQ ID NO: 69: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 69: 

TCCGGGAGTA TTATATCCAG 

(2) INFORMATION FOR SEQ ID NO: 70: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 527 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 70: 

GGCCCAAACC CATCCAAGTC CTTTTTAATT GACTTATTTA CATTATTTCT TTAATTTGGA 60 
TTAACAAATT TTTTTCTATT TGANCCCTTT AATGTTNACT CCCCGTATCT AACAAGCAAG 120 
TGATCATACT TCATTATTTT AGCAACTCCT TAATTTCCTC ATAAATGATG ATAAATATTT 180 
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240 



CTTTAAACCT TGCTATATCT TCTTTAGTTG TAGTAGCCCC AAATGATAAT CTTATACTAC 
CTTCAATAGA TTTGTCTGAT AATCCCATTG CAGCCAATAC TTCATTTAAT TTATTACGTT 300 
TAGATGAACA AGCACTCGTC GTAGATATCA TAATGTCATA TTTTGAAAAA GCATTAACTA 360 
ATACTTCACC TTTTACGCCA GGAAAACTAA GATTTAAAAC GAATGGTGAA CCTGAAGTTG 
AAGAATTAAT ATAAACTCCA TG AT ATT TAT TTAAAAATTG ACGGACGTCA TTATTTAACT 
CAGTAACAAA TGCATTCAAT GCTTCAAAGT TTTCATTAGC TCGTGCC 



(2) INFORMATION FOR SEQ ID NO: 71: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 24 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 71: 
TTTTAGCAAC TCCTTAATTT CCTC 

(2) INFORMATION FOR SEQ ID NO:72: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 72: 
GCACGAGCTA ATGAAAACTT TG 

(2) INFORMATION FOR SEQ ID NO: 73: 
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( 1 ) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 811 base pairs 

(B) type : nucleic acid 

(C) STRANDEDNESS: single 
(0) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 

(xi, SEQUENCE DESCRIPTION: SEQ ID NO:7 3: 

GACAACTTCC TAAAGCACGT GATGAAAAAG TAAGTGAATA TGGAATTGAA CAAGCTGATG 

TACCACC TCM?ATGAT AGTGAAGCCA ACATTTTAAT GTGAATTTTA 

TACCACCTGC TATGCGAGAA GATGGTAGCG AATTTGATAA AGATCTAAGT AATATCATTA 
CATTAGATGA TATTAATGGT GATATTCATA TGCATArnnr- ™ "ATATCATTA 
CTAT'crArA ™™ "•'TCATA TGCATACAAC GTATAGTGAT GGTGCGTTTT 

<-TAT i CGAGA CATGGTAGAA GCAAATATrr ™ K 

ATCATTCAra aa™™ GCAAATATCG CAAAAGGTTA TAAATTCATG GTAATTACTG 

AAArr GTTGCTAATG GCTTACAAGT GGAAAGACTT TTTANGACAA 

AAACGAAGGA AATTAAGGCT TTAGATAAAG AATATAGTGA AATTGGATAT TTATTCAGGT 
ACMGAAATG GATATATTAA CCTGATGGCT CGCTGGATTA TGATGATGAA ATTTNAGCAC 
AACTTGGATA TGTNATTGGA GCTATTCAAC AAAGCTTNAN CCAATCAGAA GAACAAATNA 
T GAACGGAT TAGCTAATGC ATGTCGCAAT CCATACGTGC GACATATAGC GC^CA 
G ™ fl TAGGTAGAAG AGATGGTTAT AAACCGAATA TTGAACAATT AATGGCATTA 
GCTGAAGAAA CGAATACAGT ATTAGAAATT AATGCCAATr n» "^CATTA 
„-„,„,., AATGCCAATC CACATCGACT GGATCTTGAA 

CGCTGAAATC GNTCGNNAAT ATCCAAATCT rAaa,.,, "wluiTGAA 

n i (-CAAATGT GAAATTAACT NTTAACACTG ATrrrrnT^n 
TNCAAATCAA TTNGATTTTN TGGAATTATG G ATGGGCATCA 

(2) INFORMATION FOR SEQ ID NO: 74: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 
fC) STRANDEDNESS: single 
(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: Genomic cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ l D NO: 74: 



60 
120 
180 
240 
300 
360 
420 
480 
540 
600 
660 
720 
780 
811 



97 



WO 97/31114 



PCT/GB97/00524 



ACGTGATGAA AAAGTAAGTG 

(2) INFORMATION FOR SEQ ID NO: 75: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 22 base pairs 
IB) TYPE: nucleic acid 
(O STRANDEDNESS: single 
(D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: Genomic cDNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 75: 
TCTTGTACCT GAATAAATAT CC 

(2) INFORMATION FOR SEQ ID NO: 76: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 681 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO.-76: 



20 



AGATCGTTCG CTAATTGACA ATTGATTAAA TCGGGTATTA CAAAATTGGA TATTACCTGT 
■rATATCTAAA AATCCACAAA TTGCTTTAGC AAGTGTTGAT NTGNCGGCAC CATTGTGACC 
AACTATACTA AGCATTTCTC TTCTATAAAC ATTTAATTGA ACATTATTAA GTHCACTATT 
AG.ATAGTGA CTATATTGAA CACATACCTC ATTTAATTCT AATAGCGGCN C jATGTGTA 
GTTATTATG, TTATGTGCAG ATGTNTCATC TATGGATTTh NN CAGTTTAA NTTTAACATG 

ttgagtgata gaaacgagag gtaa»ttggc taagttatga ATGGATTGGA GATGTAGTTC 
tg»,ta™ aggggtg»ag agtataatgg MACACGTATG GGTGGTTGTT taagcttaga 
TGATTTTAGC AAATGAG.AG GGG T ,G t A TT agcG„GA IT TTT gga IOT rZZ^ 

mgtctatga aaggtatcat gtaatga^c ttctaatcga tgttggagaa 
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TGACTTTGTT TCTTCATGAA TATTGTNTM CAATCTCAGC GTTTCATGTC CTGTCGCAGG 600 
ATCTAAATTG GCCAGCGGCT CATCCAATAT TAAAATAGGC GTNCGATGGA TTAATATACC 660 
ACCTAATGAA ACGCTCGTGC C 



(2) INFORMATION FOR SEQ ID NO:77: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Genomic cDNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 77: 
AATTGACAAT TGATTAAATC CCC 

(2) INFORMATION FOR SEQ ID NO: 78: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



681 
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(ii) MOLECULE TYPE: Genomic cDNA 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 78: 
GCCAATTTAG ATCCTGCGAC 

20 

(2) INFORMATION FOR SEQ ID NO: 79: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 164 amino acids 

(B) TYPE: amino acid 
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(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Protien 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO:79: 

Met Gly Met Val Ala Val Xaa Val Cvs Thr p „ 
! CyS Thr Pro Pro lie Gly Leu Gly 

10 

Leu Ala Thr Xaa Val Xaa Lys Tyr Lvs Ph , 

y V LyS Phe Asn «^ Ser Glu Arg Glu 
25 

Met Gly Lys Ala Xaa Phe Thr Mar ri » 

-, M6t Gly Leu p ^e Gly He Thr Glu Gly 

40 45 
Ala He Pro Phe Ala Ala Gin Aso Prn , 

50 ASP Pr ° Leu A *9 lie Pro Ala Asn 

55 6Q 

He He Gly Ala Met lie Ala Ser Val U e Ala x a * ti n 

65 lie Aia Xaa He Gly Gly Val 

7 " 75 
Gly Asp Arg Val Ala His Gly Glv Prfl T i „ , " 

y G1V Pr ° Ile Va i Ala Val Leu Gly Gly 

85 90 95 

He Asp His Val Leu Trp Phe Ile Ph= ni 

P Phe lie Phe Gly Xaa Ile Val Gly Ser Leu 

105 

Val Thr Met Pro Thr Val Leu Leu Leu Xaa Am a Th U ° 

u Xaa Arg Asn Thr Pro Val Ile 

i15 120 

125 

Ala Val Asp Ala Pro Ala Gin His Thr ri , 

130 18 THr Gin Leu Hi. Asp Thr Asp lie 

135 140 
Thr Gin His Asp Thr Glu Val A<?n i„ „ , . 

U5 ASP ASn Val As P Gly Thr Ser Glu Thr 

150 155 
Phe Thr Ser Gin 160 



(2) INFORMATION FOR SEQ ID NO: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 155 amino acids 

(B) TYPE: amino acid 

<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE : Protein 

<xi) SEQUENCE DESCRIPTION: SEQ ID NO:80: 

Met Asn lie Glu Xaa Aso Il e Asn n» » „ 
i P He Asn Gly Arg Pro Lys His n e T yr Ser 

10 

He Tyr Arg Xaa Met Met Lv* ri« , 

Lys Gin Lys Lys Gin Phe Asp Gin He Phe 



25 

Asp Leu Leu Ala He Arg Val H e Va1 A „ 

„ 116 Val Asn Ser Asn Asp Cys Tyr 

40 45 
Ala lie Leu Gly Leu Val His Thr Leu t™ , „ 

50 L6U Tr P L W P« Met Pro Gly Arg 

b5 60 



Phe Lys Asp Tyr He Ala Met Pro L« ri . 
65 ?o Lys Gln Asn Leu Tyr Gin Ser Leu 

Kis Thr Thr Val Val Gly Pro Asn Glu n ^ 80 

y ro Asn Gly Asp Pro Leu Glu He Gin lie 

85 90 95 

Arg Thr Phe Asp Met His Glu He Ala r, »• 

inn Ma Glu His G1 V Val Ala Ala His 

105 , 
Trp Ala Tyr Lys Glu Gly Lys Lys Val Spr r, T 

y Lys Val Ser Glu Lys Asp Gin Thr Tyr 
120 125 
Gin Asn Lys Leu Asn Trp Leu Lvs Glu r= „ 

130 73 G1U LSU Ala Glu Ala Asp His Thr 

135 140 
Ser Ser Asp Ala Gin Glu Phe Met Glu Tnr Leu 

145 ISO 

155 

(2) INFORMATION FOR SEQ ID NO: 81: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 139 amino acids 

(B) TYPE: amino acid 

fC) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Protein 

l*D SEQUENCE DESCRIPTION: SEQ ID NO:81: 
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Asp val Ala Lys Arg Leu Asn Ala Asn He Tyr Val Ser Gly Glu Gly 

15 



Glu Asp Ala Leu Gly Tyr Lys Asn Met Pro Ser Lys Thr Gin Phe Val 

20 25 30 

Lys His Gly Asp lie lie Gin Val Gly Asn Val Lys Leu Glu Val Leu 

35 <0 45 

His Thr Pro Gly His Thr Pro Glu Ser He Ser Phe Leu Leu Thr Asp 

50 55 60 

Leu Gly Gly Gly Ser Xaa Val Pro Met Gly Leu Phe Ser Gly Asp Phe 
" 70 75 80 

He Xaa Xaa Gly Asp He Gly Arg Pro Asp Leu Leu Glu Lys Ser Cys 

85 90 9 5 

Ser Asn Lys Gly Phe Gly Thr Lys Leu Ala Arg Asn Lys Cys Met Ser 

100 105 110 

Pro He Lys He Leu Lys He Tyr Gin Thr Met Phe Lys Ser Gly Arg 

115 120 125 

Val Met Val Leu Glu Ala Leu Val Val Lys His 
130 us 

(2) INFORMATION FOR SEQ ID NO:82: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 91 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 82: 

Met Tyr Gly Gly Val Thr Leu His Asp Asn Asn Arg Leu Thr Glu Glu 

15 10 15 

Lys Lys Val Pro He Asn Leu Trp Leu Asp Gly Lys Xaa Asn Thr Val 

20 25 30 

Pro Leu Glu Thr Val Lys Thr Asn Lys Lys Asn Val Thr Val Gin Glu 
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35 



Leu «p t eu S1 „ „. ^ Rr , Tyt ^ Mu ^ ^ ^ ^ 

50 55 

" 60 

Asn Ser Asp Val Phe Asp Gly Lys Val rin s r-, 
,. Y V al Gln Ar 9 Gly Leu He Val Phe 



80 



65 70 

' u 75 

His Thr Ser Thr Glu Pro Ser Val Asn Tyr Asp 
85 90 

(2) INFORMATION TOR SEQ ID NO: 83: 

U> SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 153 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 83: 

Met Leu xaa Lys Met Leu Tyr Leu Leu Gin He His Gin Val U e Pro 
15 10 15 

20 25 30 

Gly Leu His Phe Phe Asn Pro Pro Arg Ile Mefc ^ ^ ^ ^ 

40 45 
Leu lie Pro Thr Ser His Thr Lys Glu Ser t, , 

50 y 5er IIe Ile Leu Asp Val Lys 

Asn Phe Ala His Asn Val Leu Gly Lys Gly Val n 



65 7Q ^ uya W vai lie Val Val Asn Asp 

Val Pro Gly Phe Val Ala Asn Arg Va! Gly 1 His Thr Met Asn Asp 
95 



He Leu Tyr Arg Ala Glu Gin His Lys Xaa Spr Xa „ , „ 

^ys xaa Ser Xaa Val Asd Val Asd 

100 105 

110 

Ala Leu Thr Gly Gin Ala lie Gly Ara Pro r *u 

y Arg Pro ^ T hr Gly Thr Tyr Xaa 
5 120 

125 

Leu Ser Asp Leu Val Glv Leu x*= r, », 

Leu Xaa n e Ala Xaa Ser VaJ _ n& ^ ^ 
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130 



135 



140 



Xaa Gin Xaa Val Pro Glu Glu Thr Pro 
"5 150 

(2) INFORMATION FOR SEQ ID NO:84: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 271 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : Protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 84: 

Met Lys His Leu Leu Glv Thr i» s c=,, 

neu mr Lys Ser Gly Leu Leu Ala Thr Pro Asn 

Glu Asp Glu Lys Pro Glu Glu lie Thr Trp Arg ^ ^ ^ ^ 

Lys Leu Asp Leu Val Val Ser Leu Asp P he A r g Met Thr Ala Thr Pro 

35 40 45 

Uu Tyr Ser Asp He Val Leu Pro Ala Ala Thr Trp Tyr Glu Lys His 

Ala lie Asp Pro Leu Trp Glu Ser Arg Ser Asp Trp Asp He Tyr Lys 

85 9° 95 

Thr Leu Ala Lys Ala Phe Ser Glu Met Ala Lys As P Tyr Leu Pro Gly 

100 "3 110 

Thr Phe Lys Asp Val Val Thr Thr Pro Leu Ser His Asp Thr Lys Gin 
115 120 

Glu He Ser Thr Pro Tyr Gly Val Val Lys Asp Trp HI Lys Gly Glu 
130 135 

He Glu Ala Val Pro Gly Arg Thr Met Pro Asn Phe Ala He Val Glu 



145 



150 



155 



160 



Arg Asp T yr Thr Lys lie Tyr Asp Lys T yr Val Thr Leu Gly Pro Val 
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170 17c . 
Leu Glu Lys Gly Lys Val Gly Ala Hi, ri , 

7 HlS Gly Val Ser Gly val ser 

190 



Glu Gin Tyr Glu Glu Leu Lys Ser Mp,- r , 

195 V MSt Leu G1 * Thr Trp Ser Asp Thr 

200 205 
Asn Asp Asp Ser Val Aro Ala j tn B 

210 " " Pr ° Arg Ue As > ^r Ala Arg 

Asn Val Ala Asp Ala lie Leu Ser il« c o 

225 230 Ue SSr Ma Thr ^n Gly Lys 

235 

- s« 01 „ s Tyr „„ fep Leu „„ Glu ^ «• 

250 

Leu Lys Asp lie Ser Ser Glu Ara Ala r, . 

9 Ala Ala °lu Lys He Arg Phe 

12) INFORMATION FOR SEQ ID NO: 85: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH : 143 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: Protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO.-85: 

Met Asp He Pro Asn Asn Leu 
1 « He Ala Gin Asn Val Pro Lys Glu 

10 ., 

Met ; 0 u aia - "* «• - «» s„ „ et Thr Ala 

25 

Thr Ser Met Ala Gly Arg Gly Thr Aso II r ^ 

35 y ASP Ile L V S Leu Gly Glu Gly Val 

40 45 

Glu Ala Leu Ala Gly Leu Ala Val lie Il e H is r, „■ 

50 HlS G1U Hls Me t Glu Asn 

b5 60 



Ser Arg Val Asp Arg Gin Leu Arg Gly Ara Ser r, . 
65 V r9 Ser G1 y Ar 9 Gin Gly As p 

75 

Pro Gly ser Ser Cys lie Tyr lie Ser i » 

yr He Ser Leu Asp Asp Tyr Leu Xaa 



Lys 
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85 



90 



95 



Arg Trp Ser Asp Ser Asn Leu Ala Glu Asn ts „ r, , 

v>au Asn Asn Gin Leu Tyr Ser Xaa 

100 105 

105 110 

Asp Ala Gin Arg Leu Ser Gin Ser r~ nL 

Ser Asn Leu Phe Asn Arg Lys Val Lys 

115 120 125 

Gin lie Val Val Lys Ala Gin Arg X1 . Ser Glu Arg Thr 

130 135 

1J:> 140 

(2) INFORMATION FOR SEQ ID NO:86: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 221 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: Protein 
(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 86: 
Gly Glu Ser He Phe Val Gly Leu He Leu Gly Leu Gly He Gly Val 

1 C 

10 15 
Leu Ala Gly Tyr Lys Pro Gly Asp lie Iie Asn Leu Gly Met Ser Met 

Ala Ala Val Met Val Leu Met Pro Arg Met Val Lys lie Leu Met Glu 
35 40 45 

50 " 60 

Phe Gly Glu Arg Glu He Tyr He Gly Leu Asp Ma ^ Vfll ^ ^ 

65 70 7* 

/5 80 

Gly His Pro Ala Val He Ser Thr Ala Leu He Leu Val P ro He Thr 
85 90 95 

Val Leu Leu Ala Val He Leu Pro Gly Asn Gin Val Leu Pro Phe Gly 
Aso Leu Ala Thr lie Pro Phe Val Val Ala Phe He Val Gly Ala Ala 



115 



120 



125 



Arg Gly Asn He He His Ser Val He Val Gly Thr He Met He Ala 
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130 135 

135 140 



155 160 
Ala Lys Gly Thr Asn Val Gin Met Xaa i „c n * 

Met Xaa Lys Gly Ser Ser Glu Xaa Ser 

165 170 

175 

Ser lie Asp Gin Gly Gly Asn II* v** n 

y fciy Asn lie Xaa Asn Tyr Leu He Xaa Xaa Leu 

Xaa Ser Leu xaa Gin Xaa Lys Xaa Arg Xaa y-1 Cyg Uy ^ ^ ^ 

195 200 205 

Ser Lys Lys Arg Arg ^ ^ ^ ^ ^ ^ 



210 21S 



(2) INFORMATION FOR SEQ ID NO: 87: 

(i) SEQUENCE CHARACTERISTICS : 
(A) LENGTH: 322 amino acids 
(3) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



Ser 
220 



(ii) MOLECULE TYPE: Protein 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 87: 

5 10 15 

Glu Asn Gin Arg Glu Met Pro Trr * 

Met Pro Trp Arg Gin Thr Thr Asn Pro Tyr Tyr 

20 25 30 

»• «-P TYr Tyr His Arg Phe Gly Xaa Arg Phe Pro £ Vai ^ 

55 60 
UU Ser Gin Ala Ser Glu Asp Glu Val Leu Lys Tyr Tr P Glu Gly Leu 

«y Tyr Tyr Ser Arg Ala A rg Asn Phe His Thr Ala lie Lys Glu Z 

85 90 
Xaa Asp Lys Tyr Glu Gly Leu Val Pm r , " 

y Leu val Pro Lys Asp Pro Asp Gin Phe Lys 
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100 105 110 



Met Ser He 



Ala Leu Lys Gly Val Gly Pro Tyr Thr Gin Ala Ala Val 
115 120 125 

Ala Tyr Asn Val Pro Leu Ala Thr Val Asp Gly Asn Val Phe Arg Val 

130 135 140 

Trp Ser Arg Leu Asn Asp Asp Tyr Arg Asp lie Lys Leu Gin Ser Thr 

145 150 155 160 

Arg Lys Ser Tyr Glu Gin Glu Leu Leu Pro Tyr Val Thr Thr Glu Ala 

165 170 175 

Gly Thr Phe Asn Gin Ala Met Met Glu Leu Gly Ala Leu lie Cys Xaa 

180 185 190 

Pro Lys Asn Pro Leu Cys Leu Phe Xaa Pro Val Gin Glu Asn Cys Glu 

195 200 
Ala Phe Asp Lys Gly Pro Phe Glu Lys Leu Pro Val Lys Ser Lys Asn 

210 215 2 2o 

val ser Lys Xaa Val lie Glu Gin Ser Val Xaa Leu lie Arg Asn Asn 
225 230 235 240 

Gin Gly Gin Tyr Leu Leu Gin Lys Arg Arg Glu Xaa Leu Xaa Tyr Gly 

245 250 255 

Met Trp Gin Xaa Pro Met Xaa Asp Ser Glu His Xaa Arg Arg Lys Met 

260 265 270 

Xaa Glu Lys He Gly His Asp He Xaa Pro Xaa Glu Thr Pro lie Xaa 

275 280 285 

Glu Leu Thr His Gin Phe Thr His Leu Thr Trp Lys lie Lys Val Tyr 

290 295 300 

Ala Ala Ser Gly Ala lie Asn lie Xaa Thr Leu Pro Asp Asp Met Xaa 

315 320 



305 310 
Trp Val 



(2) INFORMATION FOR SEQ ID NO:88: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 151 amino acids 

(B) TYPE: amino acid 

<C) STRANDEDNESS : single 
(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: Protein 
(xi) SEQUENCE DESCRIPTION: SEQ 10 NO :8 8: 
- «, M. 0 X„ „. tl . „ et ^ ^ l ^ ^ ^ ^ 

C ' 5 ^ - ~ H«t cm Iyt rar 1 ser 

^ 30 
Asp His Val Arg Phe Ala Lys Thr n e n „ m . . 

y nr Ue G1 y Me t Asn Asp Met Lys u e 

40 45 
He His Lys His lie Met Pro Leu Thr i 

50 L6U Thr Leu a Asp He Ala He He 

55 

60 

Ser Ser Ser Ser Met Cys Ser Met lie Leu Gin t, c 
65 6 Leu Gln Ile Ser Gly Phe Ser 

70 ?5 

«- L- », Leu „, W Lys nl , p „ „ r uj mt » 

*,„ u . „. flt9 Lys Met ^ ;° t bis pro giu nec » ^ 
*» - «, n. M . „. „, ue ;;: val Met Ma p.. 1° Phe L . u 

120 125 
Ser Asp Ala Leu Gin Asn Xaa Tyr Tro ti- d . 

130 y Trp Ile Pro Ar 3 He Ser Phe Leu 

135 

140 

Lys Ile Asn Phe Arg Xaa Leu 
145 150 



(2) INFORMATION FOR SEQ ID 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 221 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 89: 

Met lie Phe Gly Lys Gly Thr Ala Lys Ala Thr Ser Tyr Gly Ala Gly 
1 

He lie His Phe Leu Gly Gly He His Glu He Tyr Phe Pro Tyr Val 

20 7 c 
Leu Met Arg Pro Leu Leu Phe lie Ala Val lie Leu Gly Gly Met Thr 

35 "0 45 

Gly Val Ala Thr Tyr Gin Ala Thr Gly Phe Gly Phe Lys Ser Pro Ala 

50 55 60 

Ser Pro Gly Ser Phe lie Val Tyr Cys Leu Asn Ala Pro Arg Gly Glu 

65 70 7 r 

75 80 
Phe Leu His Met Leu Leu Gly Val Phe Leu Ala Ala Leu Val Ser Phe 

85 9 ° 95 

Val Val Ala Ala Leu He Met Lys Phe Thr Arg Glu Pro Lvs Gin Asp 

100 105 ll0 

Leu Glu Ala Ala Thr Ala Gin Met Glu Asn Thr Lys Gly Lvs Lys Ser 



120 



125 



Ser Val Ala Ser Lys Leu Val Ser Ser Asp Lys Asn Val Asn Thr Glu 

130 135 140 

Glu Asn Ala Ser Gly Asn Val Ser Glu Thr Ser Ser Ser Asp Asp Asp 

Pro Glu Ala Leu Leu Asp Asn Tyr Asn Thr Glu Asp Val Asp Ala His 

I 65 170 175 

Asn Tyr Asn Asn He Asn His Val n e Phe Gly Cys Asp Ala Gly Met 

"0 165 190 

Gly Ser Ser Ala Met Gly Ala Ser Met Leu Arg Asn Lys Phe Lys Lys 

195 200 205 

Ala Gly He Asn Asp He Thr Gly Tyr Lys Tyr Cys Asp 
210 215 22Q 

(2) INFORMATION FOR SEQ ID NO:90: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 227 amino acids 

(B) TYPE: amino acid 

(C) STRANDEONESS : single 
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(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: protein 

(xi) SEQUENCE DESCRIPTION: SEQ rD NO: 90: 
Gly Thr Ser Val Ser Leu Gly Gly u e Leu He His Arg Thr Pro lie 
Leu lie Leu Asp Glu Pro Leu Ala Asn Leu Asp Pro Ala Thr Gly His 
Glu Thr Leu Arg Leu Leu Xaa Asn lie His Glu Glu Thr Lys Ser Thr 

Met lie He Val Glu His Arg Leu Glu Xaa Ser Leu Asp Asp Thr Phe 

50 " 6Q 

Asp Arg Xaa Leu Leu Phe Lys Asp Gly Lys He He Ala Asn Thr Thr 



65 



70 



75 



80 



Pro Ser Asp Leu Leu Lys Ser Ser Lys Leu Lys Glu Ala Gly He Arg 

Val Pro Leu Tyr Cys Xaa Ala Leu Xaa Tyr Xaa Glu Val Asp Val Glu 

100 105 110 

Ser lie Asp Asn Leu Ala Xaa Leu Arg Val Val Cys Met Ser Glu His 

115 120 
Val Lys xaa Lys Val Xaa Lys Trp lie Asp Xaa Thr s" Ala His Asn 



130 



135 



140 



Asp Asn Lys Tyr Thr Ser Xaa Pro Leu Leu Glu Leu Asn Glu Val Cys 

155 i 6 o 
Val Gin Tyr Ser Asp Tyr Ser Asn Ser Val Leu 



16 5 nn 

170 175 



Asn Asn Val Gin Leu 

Asn Val Tyr Arg Arg Glu Met Leu Ser He Val Gly His Asn Gly Ala 

180 190 
Xaa Xaa Ser Thr Leu Ala Lys Ala He Cys Gly Phe Leu Asp He Thr 

195 200 2Q5 

Gly Asn He Gin Phe Cys Asn Arg Gly Phe Asn Gin Leu Ser He Ser 



210 



215 



Glu Arg Ser 
225 



220 
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(2) INFORMATION FOR SEQ ID NO: 91: 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH : 25 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE : Genomic DNA 
(xi> SEQUENCE DESCRIPTION: SEQ ID NO: 91: 
GCTCCTAAAA GGTTACTCCA CCGGC 

25 
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What is claimed is: 

I . An isolated polynucleotide comprising a polynucleotide sequence selected 
from the group consisting of: 

(a) a polynucleotide having at least a 70% identity to a polynucleotide encoding 
a polypeptide comprising an amino acid sequence selected from the group consisting of SEQ 
IDNos: 1,4,7,10,13,16,19,22,25 and 28; 

(b) a polynucleotide which is complementary to the polynucleotide of (a); and 

(c) a polynucleotide comprising at least 1 5 sequential bases of the polynucleotide 



of (a) or (b) 
10 2, 
3. 
4. 



The polynucleotide of Claim I wherein the polynucleotide is DNA. 

The polynucleotide of Claim I wherein the polynucleotide is RNA. 

The polynucleotide of Claim 2 comprising the nucleotide sequence selected 

from the group consisting of SEQ IDNos: 1,4,7,10,13,16,19,22,25 and 28. 

5. An isolated polynucleotide comprising a member selected from the group 
consisting of: 

(a) a polynucleotide having at least a 70% identity to a polynucleotide encoding 
the polypeptide expressed contained in NCIMB Deposit No. 40771 and selected from the 
group consisting of SEQ ID NOs: 1,4,7,10,13,16,19,22,25 and 28; 

a polynucleotide complementary to the polynucleotide of (a); and 
a polynucleotide comprising at least 1 5 bases of the polynucleotide of (a) or 



(b) 

3 (c) 
(b). 

6. 
7. 
8. 



A vector comprising the DNA of Claim 2. 
A host cell comprising the vector of Claim 6. 

A process for producing a polypeptide comprising: expressing from the host 
cell of Claim 7 a polypeptide encoded by said DNA. 

9. A process for producing a cell which expresses a polypeptide comprising 
.ransformmg or transfecting the cel. with the vector of Claim 6 such that the cell expresses the 
polypeptide encoded by the cDNA contained in the vector. 

10. A process for producing a polypeptide of the invention or fragment 
comprising culturing a host of Cairn 7 under conditions sufficient for the production of said 
polypeptide or fragment. 

11. A polypeptide comprising an amino acid sequence selected from the group 
consisting essentially of: 79,80,8 1 ,82,83,84,85,86,87 and 88. 
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12. An antibody against the polypeptide of claim 1 1 . 

13. An antagonist which inhibits the activity of the polypeptide of claim 1 1 . 

14. A method for the treatment of an individual having need of a polypeptide of 
the invention comprising: administering to the individual a therapeutically effective amount 
of the polypeptide of claim 1 1. 

15. Hie method of Claim 14 wherein said therapeutically effective amount of the 
polypeptide is administered by providing to the individual DNA encoding said polypeptide 
and expressing said polypeptide in vivo. 

1 6. A method for the treatment of an individual having need to inhibit a 
polypeptide of the invention comprising: administering to the individual a therapeutically 
effective amount of the antagonist of Claim 13. 

1 7. A process for diagnosing a disease related to expression of the polypeptide of 
claim 1 1 comprising: 

determining a nucleic acid sequence encoding said polypeptide. 

1 8. A diagnostic process comprising: analyzing for the presence of the 
polypeptide of claim 11 in a sample derived from a host. 

1 9. A method for identifying compounds which bind to and inhibit an activity of 
the polypeptide of claim 1 1 comprising: 

contacting a cell expressing on the surface thereof a binding for the polypeptide, said 
binding being associated with a second component capable of providing a detectable signal in 
response to the binding of a compound to said binding, with a compound to be screened under 
conditions to permit binding to the binding; and 

determining whether the compound binds to and activates or inhibits the binding by 
detecting the presence or absence of a signal generated from the interaction of the compound 
with the binding. 

20. A method for inducing an immunological response in a mammal which 
comprises inoculating the mammal with a polypeptide of the invention, or a fragment or 
variant thereof, adequate to produce antibody to protect said animal from disease. 

21. A method of inducing immunological response in a mammal which comprises, 
through gene therapy, delivering gene encoding a fragment of a polypeptide of the 
invention or a variant thereof, for expressing such polypeptide, or a fragment or a variant 
thereof in vivo in order to induce an immunological response to produce antibody to protect 
said animal from disease. 
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22. An immunological composition comprising a DN A which codes for and 
expresses a polynucleotide of the invention or protein coded therefrom which, when 
introduced into a mammal, induces an immunological response in the mammal to a given 
such polynucleotide or protein coded therefrom. 

23. A polynucleotide consisting essentially of a DNA sequence obtainable by 
screening an appropriate library containing the complete gene for a polynucleotide 
sequence of the invention under stringent hybridization conditions with a probe having the 
sequence of said polynucleotide sequence or a fragment thereof; and isolating said DNA 
sequence. 
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